Shlok
Channawar

Junior at Penn State studying Applied Data Science. I work on interpretability and safety — trying to figure out what's actually happening inside language models.

activations · in motion
scroll
01About
Background

I started college as a mechanical engineering major before switching to Applied Data Science at Penn State's College of IST. That pivot led me toward AI research — specifically interpretability and safety, trying to understand what's actually going on inside language models.

Recently co-first authored a paper on whether geometric properties of SAE decoder vectors can predict feature steerability — currently in submission. Now working on two new threads: applying interpretability methods to understand how models handle private information, and building practical mech interp tooling for finance. Also attending BlueDot's AI Safety program — thinking carefully about alignment and what it actually takes to make these systems safe.

Outside of research, I play poker with friends, play chess, listen to a lot of music, and just hang out. Originally from Nagpur, India. I also love astronomy and astrophotography — you can see some of my shots here.

02Research & Projects
01

Predict Before You Steer

Under Review

Working with Algoverse on whether geometric properties of SAE features can predict how steerable they are — before you ever run a steering experiment. We look at neighbor density, co-activation patterns, and an alpha_star metric across features on models such as Gemma, Llama and etc. and evaluated on SALADBench.

SAEMechanistic Interpretability
02

Quantization Safety

In Progress

With Penn State collaborators. Post-training quantization can quietly degrade a model's safety alignment — we're trying to pin down exactly why. We introduce a V-score diagnostic and identify read-side collapse as the core failure mechanism.

QuantizationSafety Alignment
03Reading Log

Papers I've been reading

Notes on what they do and why they matter. Click any entry to expand.

04Get in Touch

Always happy to talk interpretability, safety, or anything in between.