INDEX
Explanations
This neuron fires on occurrences of the word “both.”
New Auto-Interp
Negative Logits
fraction
-0.06
-automatic
-0.06
міль
-0.06
LinkedList
-0.06
cruc
-0.06
Box
-0.06
ολ
-0.06
dál
-0.06
•↵↵
-0.06
13
-0.06
POSITIVE LOGITS
obou
0.07
bic
0.06
recv
0.06
sonucunda
0.06
abusive
0.06
updater
0.06
.bz
0.06
zprav
0.06
letras
0.06
isten
0.06
Activations Density 0.007%