INDEX
Explanations
phrases indicating being off from a position or situation
New Auto-Interp
Negative Logits
orners
-0.17
iller
-0.16
خاÙĨÙĩ
-0.15
abinet
-0.15
opoulos
-0.15
ç¬
-0.14
Doub
-0.14
lar
-0.14
kbd
-0.13
Rak
-0.13
POSITIVE LOGITS
kaar
0.17
Cobb
0.15
ifen
0.15
alli
0.15
ilty
0.14
erner
0.14
langs
0.14
indef
0.14
ACHE
0.14
-*
0.14
Activations Density 0.007%