INDEX
Explanations
terms related to illusions and deceptive perceptions
illusion, delusion, hallucination
New Auto-Interp
Negative Logits
-0.46
⎧
-0.44
IBA
-0.44
rubrique
-0.41
mtable
-0.41
getR
-0.41
treba
-0.40
therly
-0.40
algemene
-0.40
INH
-0.40
POSITIVE LOGITS
illusion
1.91
Illusion
1.73
illusion
1.64
illusions
1.63
ilusión
1.20
ilu
1.06
delusion
1.06
illusory
1.03
иллю
0.91
disillusion
0.88
Activations Density 0.006%