INDEX
Explanations
references to fictional elements
references to fictional concepts and settings
New Auto-Interp
Negative Logits
ktop
-0.88
hammad
-0.79
xual
-0.79
Da
-0.77
ni
-0.75
feeding
-0.74
cler
-0.69
KI
-0.69
annis
-0.68
hens
-0.68
POSITIVE LOGITS
ized
1.02
fictional
0.93
universes
0.90
acters
0.89
istically
0.86
ization
0.84
izations
0.83
ties
0.82
recre
0.82
portray
0.81
Activations Density 0.031%