INDEX
Explanations
negative or questioning phrases or concepts
New Auto-Interp
Negative Logits
kaar
-0.16
aci
-0.16
bon
-0.15
.scalablytyped
-0.14
ulton
-0.14
alth
-0.14
uno
-0.14
nek
-0.14
ĵn
-0.14
antom
-0.13
POSITIVE LOGITS
UPI
0.16
uddy
0.15
θή
0.14
rab
0.14
IGO
0.14
angi
0.14
Freem
0.14
repro
0.14
wells
0.13
woff
0.13
Activations Density 0.003%