INDEX
Explanations
the word "English" specifically
references to the English language
New Auto-Interp
Negative Logits
Sensor
-0.71
ront
-0.68
mounts
-0.68
orsi
-0.66
incent
-0.66
@#
-0.66
[&
-0.64
Romo
-0.64
raz
-0.63
kos
-0.63
POSITIVE LOGITS
English
3.63
English
2.98
english
2.84
english
1.93
Spanish
1.88
Arabic
1.84
Portuguese
1.77
Hindi
1.74
French
1.72
Welsh
1.69
Activations Density 0.022%