Getting Started with HPC at GRIT

High Performance Computing (HPC) gives you access to large amounts of CPU, RAM, and storage that far exceed what's available on a laptop or desktop. This guide covers the essentials: connecting, moving data, running jobs with SLURM, managing conda environments, and handling long-term storage with Nextcloud.

Connecting to the HPC

The primary entry point is hpc.grit.ucsb.edu, which serves as the head node for the GRIT HPC cluster. You don't need to know or choose which physical machine runs your work — when you log in and request resources, SLURM automatically forwards your session to an available node based on what you've requested (CPU cores, RAM, GPU, etc.).

The easiest way to get started is through OpenOnDemand at https://hpc.grit.ucsb.edu, which provides browser-based access to JupyterHub, RStudio, VS Code and a Linux desktop — no SSH or local setup required.

Access is restricted to campus IP addresses. For off-campus connections, you'll need to set up SSH keys and MFA — full instructions are in the SSH setup guide.

Storage: What Goes Where

All GRIT systems share the same home directory over the network (/home/<user>), so your files are visible across every machine you can access. This is convenient but has a performance tradeoff: for jobs that read or write large amounts of data, network-mounted storage creates a bottleneck.

Moving Data with rsync

rsync is the standard tool for moving data to and from HPC systems. It's fast, resumable, and only transfers files that have changed.

Basic syntax:

rsync -avr <source> <destination>

The flags -avr stand for archive (preserves permissions and timestamps), verbose (shows what's happening), and recursive (descends into subdirectories).

Copy local data to the HPC:

rsync -avr /local/data/ username@hpc.grit.ucsb.edu:/scratch/mylab/data/

One critical detail — trailing slashes matter:

# Copies the *contents* of /data/ into /destination/
rsync -avr /data/ username@hpc.grit.ucsb.edu:/destination/

# Creates a folder 'data' inside /destination/
rsync -avr /data username@hpc.grit.ucsb.edu:/destination/

When in doubt, do a dry run first to preview what will transfer:

rsync -avr --dry-run /data username@hpc.grit.ucsb.edu:/destination/

For long transfers, wrap your rsync command in screen so it keeps running if your connection drops:

screen
rsync -avr /large/dataset/ username@hpc.grit.ucsb.edu:/scratch/mylab/
# Detach with Ctrl+A, D — reconnect later with: screen -r

Accessing Your Files with Nextcloud

GRIT's Nextcloud instance is a browser-based GUI for accessing your GRIT storage. Think of it as a file manager for the same /home/<user> directory you use on the HPC — your home folder follows you across all GRIT services, so files you create or modify on the HPC are immediately visible in Nextcloud and vice versa. There's no syncing or transferring required between them.

This makes Nextcloud useful for:

Browsing, uploading, and downloading files without using the command line
Sharing files or folders with collaborators inside or outside UCSB directly from your GRIT storage
Quickly moving data from your laptop into your HPC home directory through a familiar drag-and-drop interface

Nextcloud is not a separate storage system — it's a window into the same storage your HPC jobs already read and write from.

To access GRIT NextCloud, visit: https://nextcloud.grit.ucsb.edu/

Note: The first time you sign in, it will mount your GRIT storage. If you run into an error message, sign out then sign back in to access NextCloud.

Interactive Apps via OpenOnDemand

OpenOnDemand (OOD) isn't just a portal for submitting batch jobs — it also lets you launch full interactive development environments directly through your browser, all backed by SLURM.

Available Interactive Apps

From the Interactive Apps menu in OOD, you can launch:

RStudio — a full RStudio IDE session running on an HPC node
Jupyter Notebooks — a Jupyter environment with access to your conda kernels and HPC storage
VS Code — browser-based VS Code with terminal access to the cluster
Linux Desktop — a full graphical desktop session for applications that require a GUI

How It Works

Each interactive app session is submitted to SLURM just like a batch job. When you request a session, you specify the resources you need (CPUs, RAM, GPUs, wall time), and SLURM allocates a node accordingly. Once your session starts, OOD connects your browser directly to that node.

This means you get the interactivity of a local IDE with the compute power of the cluster — useful for exploratory analysis on large datasets, debugging code before scaling it to a batch job, or running computationally intensive notebooks that would time out or crash locally.

Tips

Request only what you need. Interactive sessions hold resources for their entire duration, even when you're not actively using them. Other users are waiting for those same resources.
Save your work before the session ends. Output written to your home directory persists, but anything held in memory is lost when the session terminates.

Running Jobs with SLURM

SLURM (Simple Linux Utility for Resource Management) is the job scheduler used on GRIT HPC systems. Rather than running code directly and competing for resources with other users, you submit jobs to a queue. SLURM tracks available resources across all nodes and runs your job on whichever node can fulfill your request.

The Typical Workflow

Develop and test your code locally on a small data subset
Update file paths for the HPC environment
Write a SLURM job script
Submit to the queue with sbatch
Monitor until complete

Writing a SLURM Job Script

SLURM job scripts are bash scripts with #SBATCH directives at the top that specify your resource requests. SLURM uses these to find an appropriate node — if resources aren't immediately available, your job waits in the queue until they are.

Minimal example:

#!/bin/bash
#SBATCH --partition=grit_nodes   # the queue (use 'basic' on some systems)
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4        # number of CPU cores
#SBATCH --mem=16G                # total RAM requested
#SBATCH --time=02:00:00          # max run time (HH:MM:SS)
#SBATCH --job-name=my_analysis
#SBATCH --output=%x-%j.out       # stdout log (%x=job name, %j=job ID)
#SBATCH --error=%x-%j.err        # stderr log

# Your code here
python run_analysis.py

To request a GPU (SLURM will route the job to a GPU-equipped node automatically):

#SBATCH --gres=gpu:1

For R:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G
#SBATCH --time=01:00:00
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err

Rscript my_analysis.R

Submitting and Monitoring Jobs

# Test your submission without running it
sbatch --test-only my_job.sh

# Submit for real
sbatch my_job.sh

# See the full queue
squeue -a

# See only your jobs
squeue -u $USER

# Get details on a specific job
scontrol show job <jobid>

# Watch the log file in real time
tail -f my_analysis-12345.out

# Cancel a job
scancel <jobid>

Get Email When Your Job Finishes

Add these lines to your job script:

#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=yournetid@ucsb.edu

Scheduling Recurring Jobs with scrontab

SLURM has a built-in cron-like scheduler called scrontab for jobs that need to run on a schedule:

scrontab -e

The syntax matches standard cron:

# Run a script every day at 5am
0 5 * * * /home/user/scripts/daily_data_pull.sh

Unlike regular cron, scrontab jobs are managed through SLURM — they're queued like any other job, so they respect resource limits and the fair-share scheduler.

Monitoring Jobs with the Resource Utilization Analyzer

Requesting the right amount of resources is one of the harder parts of HPC workflows. Ask for too little and your job fails; ask for too much and you block other users — and your job may wait longer in the queue because fewer nodes can satisfy the request.

The Job Resource Utilization Analyzer, available in OpenOnDemand under Jobs, helps you understand how your jobs actually used the resources they were allocated. For each completed job, it shows:

CPU efficiency — how much of your requested CPU time was actually used
Memory efficiency — peak memory usage vs. what you requested
GPU utilization (for GPU jobs) — whether your GPU was kept busy

Use this tool after your first few runs to right-size your resource requests. A job that uses 4 GB of a 32 GB allocation is wasting cluster capacity and may be penalized by the fair-share scheduler over time.

Installing Miniforge3

Since conda is not available as a system module on GRIT HPC, each user installs their own Miniforge3 in their home directory. You only need to do this once.

Miniforge3 is preferred over Miniconda because it defaults to the conda-forge channel — a fully open-source, community-maintained package repository with broad scientific package coverage and no licensing restrictions. It also comes with mamba, a faster drop-in replacement for conda that handles complex environment resolution much more quickly.

Installation

Download and run the Miniforge3 installer from a HPC Shell Session:

bash

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh

During installation:

Accept the license agreement
When prompted for an install location, accept the default (/home/<user>/miniforge3)
When asked whether to run conda init, say yes — this adds the necessary initialization block to your .bashrc, which is required for conda to work in SLURM job scripts

Once installation completes, reload your shell:

bash

source ~/.bashrc

Verify the install:

bash

conda --version
mamba --version

Keeping Your Home Directory Clean

Conda environments and package caches can grow large quickly. A few habits help keep your quota in check:

Store environments in the default location (~/miniforge3/envs/) — they'll be accessible from any node since home directories are network-mounted.

Clean the package cache periodically. Conda caches downloaded packages even after installation, which can silently consume several GB:

bash

mamba clean --all

Remove environments you no longer need:

bash

conda env remove -n old_env

Check your disk usage:

bash

du -sh ~/miniforge3/
zfs list  # check overall quota

Creating and Managing Environments

If you have a environment.yml file for creating a specific conda environment, you can run the following in the same directory:

bash

conda env create -f environment.yml

Create a new environment with a specific Python version:

bash

conda create -n myenv python=3.11

Activate and install packages — use mamba instead of conda for faster installs:

bash

conda activate myenv
mamba install numpy pandas matplotlib

Export an environment for reproducibility:

bash

conda env export > environment.yml

List all your environments:

bash

conda env list

Note: Always create and test environments interactively before referencing them in a SLURM job script. See the Conda Environments in SLURM Jobs section below for how to activate them correctly in batch jobs.

Conda Environments in SLURM Jobs

This is one of the most common points of confusion for new HPC users. When SLURM runs your job script, it starts a non-interactive shell that does not automatically load your conda setup. If you just call conda activate myenv without sourcing your shell configuration first, the job will fail with a "conda: command not found" error or silently run in the base environment.

The fix is to source your .bashrc before activating any environment:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --time=04:00:00
#SBATCH --job-name=species-model
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

# Required: initialize conda in the non-interactive shell
source ~/.bashrc

# Activate your environment
conda activate myenv

# Run your code
python species_distribution_model.py

A few things to keep in mind:

Your environment must exist on the node. Because home directories are network-mounted and shared across all nodes, conda environments stored in /home/<user>/ are accessible from any node SLURM assigns you. Environments stored in scratch are not, since scratch is local to each machine.

Avoid installing packages in your job script. conda install during a running job is slow, can fail in a queue environment, and will affect other jobs using the same environment. Create and test your environment interactively first, then reference it in your job script.

Check that the right environment is active by adding a quick sanity check at the top of your script:

source ~/.bashrc
conda activate myenv
echo "Python: $(which python)"
echo "Env: $CONDA_DEFAULT_ENV"

This gets logged to your .out file and saves a lot of debugging time if something goes wrong.

Registering a Conda Environment as a Jupyter Kernel

If you want to use a conda environment in JupyterLab or a Jupyter Notebook — including sessions launched via OpenOnDemand — you need to register it as a kernel. Just activating an environment is not enough; Jupyter maintains its own list of kernels separately.

With your environment activated, install ipykernel and register it:

bash

conda activate myenv
mamba install ipykernel
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

--name is the internal identifier (no spaces)
--display-name is what appears in the JupyterLab kernel picker — make it descriptive

After registering, the kernel will appear in JupyterLab automatically the next time you start or refresh a session. No restart of JupyterHub is required.

List registered kernels:

bash

jupyter kernelspec list

Remove a kernel you no longer need:

bash

jupyter kernelspec remove myenv

Removing a kernel only unregisters it from Jupyter — it does not delete the conda environment itself.

LLM Server: AI Tools on the HPC

GRIT provides access to a locally hosted LLM server running on the HPC cluster. This allows you to use large language models — for coding assistance, writing, data analysis, and more — without sending your data to external services like OpenAI or Anthropic.

Why This Matters for Research Data

When you use a commercial LLM API or web interface, your prompts and any data you paste in are transmitted to a third-party server. For research involving sensitive data — human subjects data, proprietary datasets, pre-publication findings, or anything covered by a data use agreement — this can create compliance and confidentiality risks.

The GRIT LLM server keeps everything local: the models run on HPC hardware, your prompts never leave the UCSB network, and no data is retained or used for model training by an outside vendor. This makes it appropriate for use with research data that you wouldn't feel comfortable pasting into a public chat interface.

Accessing the LLM Server

The LLM server is accessible through OpenOnDemand or directly via API at the GRIT-hosted endpoint. Available models are updated periodically — check the GRIT documentation for the current model list and connection instructions.

You can interact with the models through:

A browser-based chat interface for conversational use: https://llm.grit.ucsb.edu
The API endpoint for programmatic access from Python, R, or other tools (compatible with the OpenAI client library)

Using the API in Your Code

The GRIT LLM API is OpenAI-compatible, so you can use the standard openai Python library by pointing it at the local endpoint:

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.grit.ucsb.edu/v1",  # GRIT endpoint
    api_key="your-grit-api-key"
)

response = client.chat.completions.create(
    model="llama3",  # check current available models in GRIT docs
    messages=[{"role": "user", "content": "Summarize this dataset description..."}]
)
print(response.choices[0].message.content)

This same pattern works in R via the httr2 or ellmer packages.

Quick Reference

Task	Command
Submit a job	`sbatch my_job.sh`
Check your jobs	`squeue -u $USER`
View full queue	`squeue -a`
Cancel a job	`scancel <jobid>`
Watch job output	`tail -f jobname-id.out`
Check disk space	`zfs list`
Copy data to HPC	`rsync -avr /local/data/ user@hpc.grit.ucsb.edu:/remote/path/`
Dry-run rsync	`rsync -avr --dry-run <src> <dst>`
Schedule recurring job	`scrontab -e`

Additional Resources