INDEX
Explanations
phrases related to illusions and delusions
terms related to illusions and misleading perceptions
New Auto-Interp
Negative Logits
secut
-0.67
Interstitial
-0.67
nan
-0.65
vez
-0.65
ippers
-0.64
ishops
-0.63
nard
-0.61
gars
-0.60
mins
-0.60
aches
-0.59
POSITIVE LOGITS
istically
1.05
ary
1.02
illusion
0.98
ually
0.93
illusions
0.92
Illusion
0.92
ery
0.89
ibility
0.85
arial
0.83
ulence
0.83
Activations Density 0.032%