INDEX
Explanations
like want
This neuron detects expressions of desire or intent, i.e. words like “want,” “like,” or “would like.”
New Auto-Interp
Negative Logits
BUTTON
-0.07
flags
-0.07
Think
-0.07
girls
-0.07
Girls
-0.07
shifted
-0.06
tourists
-0.06
_t
-0.06
swallowed
-0.06
remain
-0.06
POSITIVE LOGITS
stockholm
0.07
_ALARM
0.07
şar
0.06
velkou
0.06
frauen
0.06
fri
0.06
atd
0.06
mexico
0.06
_Do
0.06
друго
0.06
Activations Density 0.034%