Importing Data
ProtSpace uses .parquetbundle files containing protein embeddings and annotations.
Drag and Drop (Recommended)
The easiest way to load data:
- Locate your
.parquetbundlefile on your computer - Drag it onto the scatterplot canvas
- Drop when you see the drop indicator
- Data loads automatically
Drop Anywhere
You can drop the file anywhere on the scatterplot area - it doesn't need to be a specific location.
Import Button
Alternatively, use the Import button in the control bar:
- Click the Import button in the top-right corner
- Select your
.parquetbundlefile from the file picker - Click Open
Example Datasets
Don't have data yet? Download example .parquetbundle files from the GitHub data folder.
What Happens When You Load Data
After successfully loading a file:
- Scatterplot populates: All proteins appear as colored points
- First projection loads: Typically PCA if available
- Colors are assigned: Based on the first annotation in your data
- Legend appears: Shows all categories with color assignments
- Ready to explore: You can now pan, zoom, and interact with the data
Loading Time
Small datasets (< 10K proteins) load instantly. Larger datasets may take a few seconds to process and render.
Need a Data File?
To create your own .parquetbundle files:
- Using Google Colab - No installation required (recommended)
- Using Python CLI - For local processing or automation
Or download example datasets from the GitHub data folder.