INDEX
Explanations
language
This neuron activates on mentions of language (and related terms like script), flagging whenever the text discusses a specific language or writing system.
New Auto-Interp
Negative Logits
luck
-0.07
dynamically
-0.06
items
-0.06
unicorn
-0.06
travels
-0.06
jer
-0.06
Colts
-0.06
жив
-0.06
طی
-0.06
าช
-0.06
POSITIVE LOGITS
TableViewCell
0.07
rotein
0.07
educated
0.06
indows
0.06
_pixels
0.06
روم
0.06
composing
0.06
cellphone
0.06
pourrait
0.06
])
0.06
Activations Density 0.031%