INDEX
Explanations
negations and terms related to absence or prohibition
New Auto-Interp
Negative Logits
ogh
-0.19
sais
-0.15
agh
-0.15
romium
-0.15
ToWorld
-0.15
yre
-0.14
Swords
-0.14
lea
-0.14
лини
-0.14
sank
-0.14
POSITIVE LOGITS
اÙĨÙĩ
0.16
bian
0.15
iddle
0.15
gle
0.15
moth
0.14
ãĤīãģĦ
0.14
zÄĻ
0.14
Monterey
0.14
AFX
0.14
KI
0.14
Activations Density 0.001%