INDEX
Explanations
The neuron is looking for instances where the phrase "I mean" is used in a sentence
phrases indicating human behavior or inclinations
New Auto-Interp
Negative Logits
artifacts
-0.73
ifference
-0.70
gression
-0.68
htaking
-0.63
watching
-0.62
threads
-0.61
imped
-0.59
paying
-0.59
immune
-0.59
udeau
-0.58
POSITIVE LOGITS
acronym
1.47
slang
1.35
moniker
1.34
abbre
1.33
coined
1.31
term
1.29
nickname
1.28
name
1.23
referring
1.15
shorthand
1.14
Activations Density 0.670%