INDEX
Explanations
words related to abstract concepts or ideas, particularly illusions and delusions
references to illusions and delusions
New Auto-Interp
Negative Logits
Interstitial
-0.85
alez
-0.67
bid
-0.65
foreseen
-0.64
Selected
-0.63
Issues
-0.62
ded
-0.61
Discussion
-0.61
missions
-0.61
nan
-0.61
POSITIVE LOGITS
illusion
1.22
Illusion
1.01
illusions
0.98
mir
0.96
istically
0.91
uration
0.90
ually
0.90
usional
0.88
ery
0.87
ary
0.82
Activations Density 0.014%