INDEX
Explanations
phrases related to fictional elements, such as fictional worlds, characters, and settings
references to fictional elements and settings
New Auto-Interp
Negative Logits
ktop
-0.87
hammad
-0.82
gans
-0.76
feeding
-0.74
hens
-0.72
VL
-0.71
erto
-0.70
sterdam
-0.70
cler
-0.70
chains
-0.69
POSITIVE LOGITS
ized
1.03
istically
1.03
fictional
0.98
portray
0.96
universes
0.95
acters
0.94
recre
0.90
portrayal
0.89
ization
0.86
izations
0.86
Activations Density 0.018%