INDEX
Explanations
references to government, military, and health-related topics
New Auto-Interp
Negative Logits
abbit
-0.20
-0.19
adic
-0.18
adena
-0.18
aden
-0.17
alc
-0.17
abb
-0.16
agli
-0.16
wheel
-0.16
αλ
-0.16
POSITIVE LOGITS
them
0.18
ihnen
0.17
they
0.17
ayd
0.17
them
0.17
Ay
0.17
Barney
0.16
Baz
0.16
avant
0.16
ayo
0.16
Activations Density 0.052%