INDEX
Explanations
mentions of a specific fictional character
references to specific characters or names
New Auto-Interp
Negative Logits
nonexistent
-0.71
elig
-0.68
lished
-0.67
fingerprint
-0.66
tert
-0.62
unsustainable
-0.60
notorious
-0.60
partName
-0.59
dism
-0.59
Entry
-0.59
POSITIVE LOGITS
igans
0.95
bugs
0.92
kamp
0.86
felt
0.85
hoe
0.83
rake
0.77
bug
0.77
zman
0.75
ghan
0.74
gger
0.73
Activations Density 0.078%