Getting Started with HPC at GRIT
High Performance Computing (HPC) gives you access to large amounts of CPU, RAM, and storage that far exceed what's available on a laptop or desktop. This guide covers the essentials: connecting, moving data, running jobs with SLURM, managing conda environments, and handling long-term storage with Nextcloud.
Connecting to the HPC
The primary entry point is hpc.grit.ucsb.edu, which serves as the head node for the GRIT HPC cluster. You don't need to know or choose which physical machine runs your work — when you log in and request resources, SLURM automatically forwards your session to an available node based on what you've requested (CPU cores, RAM, GPU, etc.).
The easiest way to get started is through OpenOnDemand at https://hpc.grit.ucsb.edu, which provides browser-based access to JupyterHub, RStudio, VS Code and a Linux desktop — no SSH or local setup required.
Access is restricted to campus IP addresses. For off-campus connections, you'll need to set up SSH keys and MFA — full instructions are in the SSH setup guide.
Storage: What Goes Where
All GRIT systems share the same home directory over the network (/home/<user>), so your files are visible across every machine you can access. This is convenient but has a performance tradeoff: for jobs that read or write large amounts of data, network-mounted storage creates a bottleneck.
Moving Data with rsync
rsync is the standard tool for moving data to and from HPC systems. It's fast, resumable, and only transfers files that have changed.
Basic syntax:
rsync -avr <source> <destination>
The flags -avr stand for archive (preserves permissions and timestamps), verbose (shows what's happening), and recursive (descends into subdirectories).
Copy local data to the HPC:
rsync -avr /local/data/ username@hpc.grit.ucsb.edu:/scratch/mylab/data/
One critical detail — trailing slashes matter:
# Copies the *contents* of /data/ into /destination/ rsync -avr /data/ username@hpc.grit.ucsb.edu:/destination/ # Creates a folder 'data' inside /destination/ rsync -avr /data username@hpc.grit.ucsb.edu:/destination/
When in doubt, do a dry run first to preview what will transfer:
rsync -avr --dry-run /data username@hpc.grit.ucsb.edu:/destination/
For long transfers, wrap your rsync command in screen so it keeps running if your connection drops:
screen rsync -avr /large/dataset/ username@hpc.grit.ucsb.edu:/scratch/mylab/ # Detach with Ctrl+A, D — reconnect later with: screen -r
Accessing Your Files with Nextcloud
GRIT's Nextcloud instance is a browser-based GUI for accessing your GRIT storage. Think of it as a file manager for the same /home/<user> directory you use on the HPC — your home folder follows you across all GRIT services, so files you create or modify on the HPC are immediately visible in Nextcloud and vice versa. There's no syncing or transferring required between them.
This makes Nextcloud useful for:
- Browsing, uploading, and downloading files without using the command line
- Sharing files or folders with collaborators inside or outside UCSB directly from your GRIT storage
- Quickly moving data from your laptop into your HPC home directory through a familiar drag-and-drop interface
Nextcloud is not a separate storage system — it's a window into the same storage your HPC jobs already read and write from.
To access GRIT NextCloud, visit: https://nextcloud.grit.ucsb.edu/
Note: The first time you sign in, it will mount your GRIT storage. If you run into an error message, sign out then sign back in to access NextCloud.
Interactive Apps via OpenOnDemand
OpenOnDemand (OOD) isn't just a portal for submitting batch jobs — it also lets you launch full interactive development environments directly through your browser, all backed by SLURM.
Available Interactive Apps
From the Interactive Apps menu in OOD, you can launch:
- RStudio — a full RStudio IDE session running on an HPC node
- Jupyter Notebooks — a Jupyter environment with access to your conda kernels and HPC storage
- VS Code — browser-based VS Code with terminal access to the cluster
- Linux Desktop — a full graphical desktop session for applications that require a GUI
How It Works
Each interactive app session is submitted to SLURM just like a batch job. When you request a session, you specify the resources you need (CPUs, RAM, GPUs, wall time), and SLURM allocates a node accordingly. Once your session starts, OOD connects your browser directly to that node.
This means you get the interactivity of a local IDE with the compute power of the cluster — useful for exploratory analysis on large datasets, debugging code before scaling it to a batch job, or running computationally intensive notebooks that would time out or crash locally.
Tips
- Request only what you need. Interactive sessions hold resources for their entire duration, even when you're not actively using them. Other users are waiting for those same resources.
- Save your work before the session ends. Output written to your home directory persists, but anything held in memory is lost when the session terminates.
Running Jobs with SLURM
SLURM (Simple Linux Utility for Resource Management) is the job scheduler used on GRIT HPC systems. Rather than running code directly and competing for resources with other users, you submit jobs to a queue. SLURM tracks available resources across all nodes and runs your job on whichever node can fulfill your request.
The Typical Workflow
- Develop and test your code locally on a small data subset
- Update file paths for the HPC environment
- Write a SLURM job script
- Submit to the queue with
sbatch - Monitor until complete
Writing a SLURM Job Script
SLURM job scripts are bash scripts with #SBATCH directives at the top that specify your resource requests. SLURM uses these to find an appropriate node — if resources aren't immediately available, your job waits in the queue until they are.
Minimal example:
#!/bin/bash #SBATCH --partition=grit_nodes # the queue (use 'basic' on some systems) #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 # number of CPU cores #SBATCH --mem=16G # total RAM requested #SBATCH --time=02:00:00 # max run time (HH:MM:SS) #SBATCH --job-name=my_analysis #SBATCH --output=%x-%j.out # stdout log (%x=job name, %j=job ID) #SBATCH --error=%x-%j.err # stderr log # Your code here python run_analysis.py
To request a GPU (SLURM will route the job to a GPU-equipped node automatically):
#SBATCH --gres=gpu:1
For R:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=8G #SBATCH --time=01:00:00 #SBATCH --output=%x.%j.out #SBATCH --error=%x.%j.err Rscript my_analysis.R
Submitting and Monitoring Jobs
# Test your submission without running it sbatch --test-only my_job.sh # Submit for real sbatch my_job.sh # See the full queue squeue -a # See only your jobs squeue -u $USER # Get details on a specific job scontrol show job <jobid> # Watch the log file in real time tail -f my_analysis-12345.out # Cancel a job scancel <jobid>
Get Email When Your Job Finishes
Add these lines to your job script:
#SBATCH --mail-type=END,FAIL #SBATCH --mail-user=yournetid@ucsb.edu
Scheduling Recurring Jobs with scrontab
SLURM has a built-in cron-like scheduler called scrontab for jobs that need to run on a schedule:
scrontab -e
The syntax matches standard cron:
# Run a script every day at 5am 0 5 * * * /home/user/scripts/daily_data_pull.sh
Unlike regular cron, scrontab jobs are managed through SLURM — they're queued like any other job, so they respect resource limits and the fair-share scheduler.
Monitoring Jobs with the Resource Utilization Analyzer
Requesting the right amount of resources is one of the harder parts of HPC workflows. Ask for too little and your job fails; ask for too much and you block other users — and your job may wait longer in the queue because fewer nodes can satisfy the request.
The Job Resource Utilization Analyzer, available in OpenOnDemand under Jobs, helps you understand how your jobs actually used the resources they were allocated. For each completed job, it shows:
- CPU efficiency — how much of your requested CPU time was actually used
- Memory efficiency — peak memory usage vs. what you requested
- GPU utilization (for GPU jobs) — whether your GPU was kept busy
Use this tool after your first few runs to right-size your resource requests. A job that uses 4 GB of a 32 GB allocation is wasting cluster capacity and may be penalized by the fair-share scheduler over time.
Installing Miniforge3
Since conda is not available as a system module on GRIT HPC, each user installs their own Miniforge3 in their home directory. You only need to do this once.
Miniforge3 is preferred over Miniconda because it defaults to the conda-forge channel — a fully open-source, community-maintained package repository with broad scientific package coverage and no licensing restrictions. It also comes with mamba, a faster drop-in replacement for conda that handles complex environment resolution much more quickly.
Installation
Download and run the Miniforge3 installer from a HPC Shell Session:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh bash Miniforge3-Linux-x86_64.sh
During installation:
- Accept the license agreement
- When prompted for an install location, accept the default (
/home/<user>/miniforge3) - When asked whether to run
conda init, say yes — this adds the necessary initialization block to your.bashrc, which is required for conda to work in SLURM job scripts
Once installation completes, reload your shell:
source ~/.bashrc
Verify the install:
conda --version mamba --version
Keeping Your Home Directory Clean
Conda environments and package caches can grow large quickly. A few habits help keep your quota in check:
Store environments in the default location (~/miniforge3/envs/) — they'll be accessible from any node since home directories are network-mounted.
Clean the package cache periodically. Conda caches downloaded packages even after installation, which can silently consume several GB:
mamba clean --all
Remove environments you no longer need:
conda env remove -n old_env
Check your disk usage:
du -sh ~/miniforge3/ zfs list # check overall quota
Creating and Managing Environments
If you have a environment.yml file for creating a specific conda environment, you can run the following in the same directory:
conda env create -f environment.yml
Create a new environment with a specific Python version:
conda create -n myenv python=3.11
Activate and install packages — use mamba instead of conda for faster installs:
conda activate myenv mamba install numpy pandas matplotlib
Export an environment for reproducibility:
conda env export > environment.yml
List all your environments:
conda env list
Note: Always create and test environments interactively before referencing them in a SLURM job script. See the Conda Environments in SLURM Jobs section below for how to activate them correctly in batch jobs.
Conda Environments in SLURM Jobs
This is one of the most common points of confusion for new HPC users. When SLURM runs your job script, it starts a non-interactive shell that does not automatically load your conda setup. If you just call conda activate myenv without sourcing your shell configuration first, the job will fail with a "conda: command not found" error or silently run in the base environment.
The fix is to source your .bashrc before activating any environment:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=32G #SBATCH --time=04:00:00 #SBATCH --job-name=species-model #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err # Required: initialize conda in the non-interactive shell source ~/.bashrc # Activate your environment conda activate myenv # Run your code python species_distribution_model.py
A few things to keep in mind:
Your environment must exist on the node. Because home directories are network-mounted and shared across all nodes, conda environments stored in /home/<user>/ are accessible from any node SLURM assigns you. Environments stored in scratch are not, since scratch is local to each machine.
Avoid installing packages in your job script. conda install during a running job is slow, can fail in a queue environment, and will affect other jobs using the same environment. Create and test your environment interactively first, then reference it in your job script.
Check that the right environment is active by adding a quick sanity check at the top of your script:
source ~/.bashrc conda activate myenv echo "Python: $(which python)" echo "Env: $CONDA_DEFAULT_ENV"
This gets logged to your .out file and saves a lot of debugging time if something goes wrong.
Registering a Conda Environment as a Jupyter Kernel
If you want to use a conda environment in JupyterLab or a Jupyter Notebook — including sessions launched via OpenOnDemand — you need to register it as a kernel. Just activating an environment is not enough; Jupyter maintains its own list of kernels separately.
With your environment activated, install ipykernel and register it:
conda activate myenv mamba install ipykernel python -m ipykernel install --user --name myenv --display-name "Python (myenv)"
-
--nameis the internal identifier (no spaces) -
--display-nameis what appears in the JupyterLab kernel picker — make it descriptive
After registering, the kernel will appear in JupyterLab automatically the next time you start or refresh a session. No restart of JupyterHub is required.
List registered kernels:
jupyter kernelspec list
Remove a kernel you no longer need:
jupyter kernelspec remove myenv
Removing a kernel only unregisters it from Jupyter — it does not delete the conda environment itself.
LLM Server: AI Tools on the HPC
GRIT provides access to a locally hosted LLM server running on the HPC cluster. This allows you to use large language models — for coding assistance, writing, data analysis, and more — without sending your data to external services like OpenAI or Anthropic.
Why This Matters for Research Data
When you use a commercial LLM API or web interface, your prompts and any data you paste in are transmitted to a third-party server. For research involving sensitive data — human subjects data, proprietary datasets, pre-publication findings, or anything covered by a data use agreement — this can create compliance and confidentiality risks.
The GRIT LLM server keeps everything local: the models run on HPC hardware, your prompts never leave the UCSB network, and no data is retained or used for model training by an outside vendor. This makes it appropriate for use with research data that you wouldn't feel comfortable pasting into a public chat interface.
Accessing the LLM Server
The LLM server is accessible through OpenOnDemand or directly via API at the GRIT-hosted endpoint. Available models are updated periodically — check the GRIT documentation for the current model list and connection instructions.
You can interact with the models through:
- A browser-based chat interface for conversational use: https://llm.grit.ucsb.edu
- The API endpoint for programmatic access from Python, R, or other tools (compatible with the OpenAI client library)
Using the API in Your Code
The GRIT LLM API is OpenAI-compatible, so you can use the standard openai Python library by pointing it at the local endpoint:
from openai import OpenAI
client = OpenAI(
base_url="https://llm.grit.ucsb.edu/v1", # GRIT endpoint
api_key="your-grit-api-key"
)
response = client.chat.completions.create(
model="llama3", # check current available models in GRIT docs
messages=[{"role": "user", "content": "Summarize this dataset description..."}]
)
print(response.choices[0].message.content)
This same pattern works in R via the httr2 or ellmer packages.
Quick Reference
| Task | Command |
|---|---|
| Submit a job | sbatch my_job.sh |
| Check your jobs | squeue -u $USER |
| View full queue | squeue -a |
| Cancel a job | scancel <jobid> |
| Watch job output | tail -f jobname-id.out |
| Check disk space | zfs list |
| Copy data to HPC | rsync -avr /local/data/ user@hpc.grit.ucsb.edu:/remote/path/ |
| Dry-run rsync | rsync -avr --dry-run <src> <dst> |
| Schedule recurring job | scrontab -e |