• Skip to main content
  • Home
  • Blog
  • About
  • Github
  • Contact

Courtney Sims

courtneysims

Animals Playing Music

July 29, 2024 by courtneysims 2 Comments

Earlier this summer, I participated in a hackathon. It was an un-themed, free-for-all hackathon (my favorite kind), so, naturally, I asked myself — what if we could use technology to help animals play music?

Credit to emperor_slime

Observation

I live with a herding breed dog. If you know anything about these kinds of dogs, you know they’re always staring at their people, waiting for something to do. They’re very smart and very active. If you too have lived with a herding breed dog, you know how easily they get bored and how much they rely on you to fix this for them. You also have probably developed some empathy for this situation and feel a little guilty when you do activities that don’t include them. Or maybe that’s just me, but that’s where this whole thing started.

I like to play music, but often when I sit down to play, my dog stares at me with her sweet puppy dog eyes and, instead of feeling happy about playing music, I feel sad that she can’t play too.

Question

What if it didn’t have to be like this? What if Penny (or Luna, the cat!) could join the band?

Hypothesis

  1. If I arrange a way for Penny and Luna to step on something that produces musical sound, they will choose to do this for a reward.
  2. The stepping can be done in a way that complements music being played by others.

Attempt #1

What happens when I try to convince them to play an instrument without technology? Maybe we don’t need tech after all. Let’s find out . . .

Materials

  • Piano
  • Tiny scoop of cat food
  • Cat toy

Methodology & Results

Luna

  1. Attempt to lure Luna onto the piano with a scoop of food. -> No response.
  2. Attempt to lure Luna onto the piano bench with a toy. -> No response. Absolutely no interest in approaching the piano for any reason.

Penny

  1. Leverage previous trick experience to get Penny to put her paws on the piano bench. -> Success!
  2. Leverage “shake” cue to get Penny to put her paw on the piano keys. -> Somewhat successful . . . but the sound was not very pleasant and she looked physically uncomfortable in that position . . . and tbh I was concerned she would hurt the piano.

Attempt #2

What if we use a little bit of physics instead of tech? How much better is that?

Materials

  • Small midi keyboard placed on the ground
  • Some random hook that the person who owned my house before me left in the garage
    • Could be anything you can turn into a lever though
  • Box from an Amazon delivery
  • Box cutter

Methodology

  1. Setup a midi keyboard to create a fun sound with the push of adjacent keys.
  2. Cut small pieces of cardboard for stacking.
  3. Create a simple machine (by placing the weird hook on pieces of cardboard) to concentrate the weight of the paw onto those specific keys (and simultaneously protect the keyboard from direct paw->key contact).

Results

Luna

Zero interest. Not good.

Penny

Not bad.

Attempt #3

Okay, let’s think more about the user experience here. What if there was something flat on the ground that could just be walked across to produce sound? Can we create a pressure sensor and somehow connect that to a music producing device?

Materials

  • More cardboard from the Amazon box
  • Aluminum foil
  • Wire
  • Wire strippers (or a knife, or a fingernail)
  • Tape
  • Raspberry pi
    • Connected to a monitor, keyboard, and mouse
  • Breadboard
  • Breadboard jumper wires
  • Resistor
  • Optional
    • Battery
    • LED diode light

Methodology

  1. Follow steps 1-5 of this guide to create a DIY pressure sensor. It is shockingly straightforward.

Optional: to confirm the pressure sensor is working, connect the wires to a battery and lightbulb.

2. Follow this guide to connect the raspberry pi to the pressure pad. Choose the basic setup option with the resistor and, of course, the DIY pressure pad instead of a store-bought sensor. Strip the ends from the wires to insert them into the breadboard, just like when taping them to the aluminum foil previously.

3. The code is super simple. Use the RPi.GPIO library for the breadboard connection and the pygame library for the music. After trying several sound types, I left it on piano, which can be heard in the videos.

Results

Look at them go!!

Conclusion

This was a good MVP. I created a device that produced musical sound and that my users seemed to like.

Looking back at my hypotheses, I validated 1/2 of them. When I arranged a way for Penny and Luna to step on something that produced musical sound, they did in fact choose to do this for a reward.

The next steps for this project would then be validating the second hypothesis, which I sadly did not have time for during this hackathon: Configure the device to complement music being played by others. There are so many options for this! Can the animals play a chord at the right time? Can they play a series of recorded notes on a consistent beat? Could we time the reward dispensation to a pattern that facilitates this? Or maybe the animals could have their own versions of a guitar solo at a particular point in a song?

Some things to work out for this next phase are how to work with more advanced sounds in the code (I couldn’t get my own mp3 files to play, only the samples), how to automatically dispense the rewards, and how to keep the animals nearby and engaged when rewards are not being dispensed.

Filed Under: Uncategorized

Using Neuroscope to Explore LLM Interpretability: A Guide for Anxious Software Engineers

June 10, 2024 by courtneysims 4 Comments

Over the past month I’ve been exploring mechanistic interpretability – a field dedicated to understanding the internal workings of machine learning models. 

I set out to find patterns in LLMs, but the real value I got from this project was finding a pattern in myself. I hope anyone reading this who discovers that pattern in themselves too, realizes that not only are they not alone, but they are powerful and capable.

How did we get here?

The AISF Course

Ever since I read Superintelligence, I’ve been equally fascinated by both the possibilities and problems that come along with the quest for artificial general intelligence (human-grade AI). So I was both excited and nervous to be accepted into BlueDot’s AI Safety Fundamentals: Alignment course. Excited, for hopefully obvious reasons, but nervous because my machine learning background was minimal. I wondered how many others there would be like me.

This type of situation has been a pattern in my life. I started physics classes early in my high school career so I could take more of them, yet worried how I would measure up to the older students with higher level math experience. I took a job in technical support after college because I became fascinated by how the internet worked, but worried how I would fit in having next to no computer science experience and consistently realizing I was the only woman in the room at work. I followed that line of interest into a software engineering role and, once again, struggled with the idea that I didn’t have a CS degree or a passionate history of playing with Linux machines and writing Java during summer vacations like my peers.

Now here I was, again, putting myself into a situation where I felt excited and interested, but also underprepared and out of place.

The Project

The final month of the course was dedicated to an individual project. After learning about mechanistic interpretability, I knew I wanted my project to be in this space. Mech interp seeks to look inside the black box that is a machine learning model and understand exactly how it makes sense of data we give it to generate an output – and that’s cool af. I wanted my project to reflect the amazement I felt, but with only a basic software engineering background, having only read the articles from the coursework with the barest understanding of what was actually technically happening in them, and no idea what a transformer was . . . I worried about what I could actually accomplish.

Throughout the project, we were encouraged to read less, do more – to dive in. While this is not something I do often, I wanted the full experience of the course as intended, so instead of finding some long tutorial that I’d never actually finish about how transformers work, I embraced the directive and committed myself to finding a do-able project with an actual output.

When I came across this list of 200 open problems to explore in the field of mechanistic interpretability, I started to feel hopeful. They are laid out by category and – most importantly, for me – by difficulty. I found the most beginner-est of beginner problems and set my sights on that. It had a simple, exploratory goal – “search for patterns in neuroscope and see if they hold up.”

Okay, what is neuroscope?

Quick LLM Refresher

When text is input to an LLM, it’s broken down into different segments for the model to work with. These segments of text are called tokens. For example, the text “Hi, I am Courtney” might be broken down by a model into this list of tokens [“Hi”, “,”, “I”, “am”, “Courtney”].

LLMs are neural networks, so when data is passed into them, the neurons making up the network learn to activate based on different aspects of that data. Basically, each neuron is going to really like some tokens and not care about others. The neurons each activate on different parts of the text, but they connect with one another to turn all of those different activations into a (hopefully!) meaningful output.

Okay, back to neuroscope

Neuroscope is a website created by mechanistic interpretability researcher, Neel Nanda, which catalogs every neuron in a given LLM. Each neuron has its own page on the site, which shows the twenty segments of text that most activated that specific neuron. Within those segments of text, the tokens that most strongly activated the neuron are highlighted in increasing shades of blue. The more intense the color, the more “excited” that neuron felt about the word.

The website leverages a python library called TransformerLens to do this. It’s called TransformerLens because it looks at transformers. Naturally, the question then arises, what is a transformer? All I can say for now is that it’s the architecture used to create LLMs. Other than that most basic of understanding, I still don’t actually know.

The Plan

My project plan was to see if I could detect commonalities in the texts that most strongly activated a given neuron. I wanted to find neurons that seemed to be highlighting the same types of numbers or words in a pattern that could suggest some feature that the neuron had been trained to represent within the model.

I wanted to know how hard this actually was to do and how many iterations through random neurons it would take to find something. I expected to spend the entire project searching for an elusive single pattern. Instead, I found what might be several. The following is a breakdown of the questions I asked, the hypotheses I formed, and the answers I found.

Phase 1: Exploring neuroscope

How hard was it to find a pattern?

This honestly took less time than I expected. I wish I had logged how many neurons I clicked through, but all I can say is that it didn’t take more than a few hours to find nine potential patterns.

I selected the first model in the supported models list (because why not?), solu-1l. Then I clicked to view a “random” neuron in that model over and over again, skimming through the texts on each page and looking for human-readable indications of patterns. Specifically, this meant looking for repetition (ex. successive numbers were always highlighted) or a common theme (ex. all the texts were recipes).

How would I know if I truly found a feature?

The next step was verifying whether or not the pattern I thought I found held up to scrutiny. To do this, I decided to pull from the arguments for curve detector neurons in visual models to craft a test. I found and created examples that were not in the initial set of texts from neuroscope, but which matched the pattern I thought I was seeing. I also created texts that specifically did not match the pattern to ensure these texts did not generate strong activations.

Then, I used the interactive version of neuroscope — a notebook that can do what the neuroscope website does, but on demand. In the notebook, you can specify your model and your neuron, and then pass in any input text that you want to see how much the neuron “likes” it (based on how intensely the words are highlighted).

If I shared the details of all nine patterns I tested, this might become unreadable, so I’ll just share two – one that seemed promising in light of my hypothesis and one that did not. For each one, I first validated that the interactive neuroscope was working as expected by loading the appropriate model and neuron (under “Extracting Model Activations”), then running it on one of the strongly activating texts listed on the neuroscope website the outputs were the same.

Solu-1l, layer 0, neuron 260

Hypothesis

Activates strongly for “I”, “you”, and “we” in review or comment text.

Developing test texts

From Amazon’s homepage, I selected the first review from each of the first ten products shown. I compared this with the same set of texts, but with the pronouns stripped.

Examples

Matches the patternDoes not match the pattern
I disliked nothing I loved it, perfect sizeDisliked nothing loved it, perfect size
I recently purchased the Women’s Casual Long Sleeve Button Down Loose Striped Cotton Maxi Shirt Dress, and I absolutely love it! The material is soft and comfortable, and the loose fit is perfect for all-day wear. The striped pattern adds a nice touch of style, and the buttons down the front make it easy to put on and take off. And best of all, it has pockets. Overall, highly recommend!Recently purchased the Women’s Casual Long Sleeve Button Down Loose Striped Cotton Maxi Shirt Dress, and absolutely love it! The material is soft and comfortable, and the loose fit is perfect for all-day wear. The striped pattern adds a nice touch of style, and the buttons down the front make it easy to put on and take off. And best of all, it has pockets. Overall, highly recommend!

Results

Out of ten sets of test texts, not a single one highlighted the pronouns of the review text as expected. There was no visual distinction betwen the test texts that matched the pattern and those that did not.

Solu-1l, layer 0, neuron 737

Hypothesis

Activates strongly for numbered or lettered lists when written as “x)”. 

Developing test texts

I used a combination of personally written lists and lists found from recipes that matched the pattern. I compared them to lists that deviated from the pattern.

Examples

Matches the patternDoes not match the pattern
a) Aurora b) Penumbra c) LunaAurora Penumbra Luna
“WHY YOU WILL LOVE THIS RECIPE!
1) Creamy and Flavorful: 
2) Easy and Quick: 
3) Versatile and Adaptable: 
4) Nourishing and Satisfying: 
5) Fall Comfort Food: “
“WHY YOU WILL LOVE THIS RECIPE!
1. Creamy and Flavorful: 
2. Easy and Quick: 
3. Versatile and Adaptable: 
4. Nourishing and Satisfying: 
5. Fall Comfort Food: “

Results

Out of six sets of test texts, every single one highlighted the lists as expected!

Phase 2: I guess I’ll write some code now?

Phase 1 was originally intended to be the entirety of the project. I did not expect to be able to understand and use the tools so quickly. So what to do with the remaining project time?

I could have continued to search for patterns, but I wondered what else I could do. Again, though, I felt intimidated. Could I learn enough to do anything in such a short timespan (or ever, thought my anxious brain)? I remembered the excitement that I felt when initially learning about mechanistic interpretability and how freeing it felt to embark on a project focused around exploration rather than production. I wanted to continue in that mindset.

As I pondered my situation, the universality claim (that features represented in one model can be found represented in other models too) came to mind and a silly thought arose. Would it be possible to write a gigantic loop that iterated over every single neuron in an LLM and check for one of the features I thought I found in phase 1? How long would that take to run? Would my laptop meet a fiery end?

That sounded fun. 

What I wrote

TransformerLens lets us look at a given text as a list of tokens. My plan was to programmatically get the indices of the most activating tokens for the neurons that I had tested in phase 1 and iterate through every single neuron in another model to see if any of those neurons activated strongly on the same indices.

You can see the code here.

Results

This code takes around 90 minutes to run, but it did in fact complete! There is a nest of three for loops in there, which is always fun, but rarely practical (unless you really need that time to make a sandwich, which I did). 

The code runs, but I have yet to find neurons representing the same features in another model. And I think this could be why —

Model nameTextTotal tokensToken break down
solu-1l8) singled out 9) personality80 => ‘ 8’
1 => ‘)’
2 => ‘ sing’
3 => ‘led’
4 => ‘ out’
5 => ‘ 9’
6 => ‘)’
7 => ‘ personality’
gpt2-small8) singled out 9) personality70 => ‘ 8’
1 => ‘)’
2 => ‘ singled’
3 => ‘ out’
4 => ‘ 9’
5 => ‘)’
6 => ‘ personality’

The same piece of text may not be tokenized the same way for each model. This means when looking for activations in the new model at the indices of the most activating tokens from my initial model (where the feature was discovered), I’m looking at totally different pieces of text. To highlight this issue with the example above, if the number “9” was very strongly activating for the neuron in solu-1l (my initial model), that would be index 5 in the list of tokens for that model. If I looked for neurons in gpt2-small that activated at index 5 of the list of tokens it generated from the same exact text, it would be looking for neurons that were strongly activated by a closing parenthese.

Conclusion

Why did any of this matter?

My technical achievements throughout this project have been minimal at best – a couple of neuron activation patterns were proofed amateurly, a draft PR was created for TransformerLens with some failing tests (that I didn’t touch?) because I like to leave code better than I found it, and a half-finished proof-of-concept for automating universality testing was created. But I did complete the goal I set out to accomplish — search for patterns in neuroscope and see if they hold up — and more.

Story Time

Reflecting on this project, I’m reminded of the time I decided to get a Red Hat certification. I wanted to know more about the internet and servers, but instead of diving in, I thought I would skill up first. Once I had a linux administrator’s certification, surely then I would be very knowledgeable and capable and able to get a job in this space. The problem with this approach, though, was that I had never had a job in this space. Most of the lessons and tasks from the book I was using to study either made no sense or seemed completely pointless. 

I still do not have that certification. For a time, I internalized the experience as a failure in me to be able to understand and perform that kind of work. But after spending a year as a tech support agent, I started to get it.

The Story Is Relevant – I Promise

I have a pattern of doing this – of trying to insulate myself from challenge and failure by learning “the right way” and only taking action when I’m “prepared.” Like studying for a test. And yet, the times I’ve been most successful in life (outside of formal schooling) have been when I’ve gotten myself in slightly over my head and rose to meet the challenge. The times when I stopped trying to be perfect and let life happen. The times when I figured out what was important or how something worked by participating in it.

The real purpose behind me writing this post is to encourage other SWEs, other women, and anyone else who might worry that they aren’t smart or skilled enough to do things in mech interp, or more broadly within AI Safety, or even more broadly in life.

It’s okay to dip your toes in. It’s okay to not read every instruction manual and GitHub ReadMe and forum post before developing ideas and trying stuff. It’s okay to put yourself in a position where you’ll be pushed out of your comfort zone. It is scary and it is hard and you probably won’t be the smartest person in the room. You may, in fact, be the dumbest person in the room. But that doesn’t mean that you can’t do it or that you don’t belong there.

Other stuff I learned

  1. Unsurprisingly a library all about transformers is difficult to go very deep with when you don’t know what a transformer is. To really understand the code and do more with it, I need a stronger understanding of how transformers work.
  2. For as small a field as mech interp is, it’s shockingly accessible, mostly because of Neel Nanda.
  3. Little tiny mistakes have always stumped me the most. I typed “solu-11” instead of “solu-1l” into the interactive neuroscope notebook and spent at least an hour trying to understand why the model wasn’t being found. I forgot to switch the layer number in the code a few times when swapping between models and spent at least an hour trying to understand why I was getting a bad index. This is a twofold learning:
    1. As software engineers, it’s critical to raise error messages that are as clear as possible. This ensures we spend our time on the big problems rather than the silly ones.
    2. In my first foray with programming, I spent an entire afternoon trying to figure out how to type the “big x” in html . . . you know >< . . . as in something like <div><\div> . . . yeah. It was incredibly discouraging. But I’ve made enough of these tiny, silly mistakes now to know that a) documentation really needs to treat us like we’re five and b) a long walk and a good rubber ducky community are invaluable tools for a programmer.
  4. Hacking something together that’s not great, but still manages to get the job done is often something I find fun. I should do that more.

Mech Interp Stuff To Explore Next

  1. Are patterns harder or easier to detect in models with more layers?
  2. Is there a more efficient way to look for evidence of universality if I understand transformers better? What about using numpy or scipy to compare histograms of activations?
  3. What kind of fun stuff can we explore with circuits and circuitvis integration?
  4. How can we better help SWEs get involved in this space?

Filed Under: Uncategorized

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in