INDEX
Explanations
the word "illusion" with varying levels of emphasis
concepts related to illusions and delusions
New Auto-Interp
Negative Logits
vez
-0.67
annis
-0.64
poon
-0.64
foreseen
-0.64
MIT
-0.63
nard
-0.63
Issues
-0.61
gars
-0.60
mins
-0.59
uled
-0.59
POSITIVE LOGITS
illusion
1.03
ually
1.01
istically
0.94
Illusion
0.93
mir
0.89
illusions
0.88
ary
0.88
istical
0.84
arial
0.82
ual
0.80
Activations Density 0.025%