INDEX
Explanations
phrases that question reality or seek clarification
New Auto-Interp
Negative Logits
жен
-0.17
oux
-0.15
ouv
-0.15
iveau
-0.15
PELL
-0.15
боÑĤ
-0.15
alfa
-0.15
llib
-0.15
ERSIST
-0.14
erialize
-0.14
POSITIVE LOGITS
Ä±ÅŁÄ±k
0.15
we
0.14
fat
0.13
-ÑĤо
0.13
SetName
0.13
artial
0.13
_exchange
0.13
fucking
0.13
sız
0.13
eject
0.13
Activations Density 0.037%