INDEX
Explanations
mentions of specific names or references related to media entities
occurrences of specific names or identifiers in the text
New Auto-Interp
Negative Logits
elig
-0.74
lished
-0.69
Rivals
-0.64
bluff
-0.63
obscene
-0.60
ultras
-0.60
Ultr
-0.60
ADRA
-0.60
OPLE
-0.58
obos
-0.58
POSITIVE LOGITS
felt
0.96
igans
0.92
idan
0.80
sworth
0.72
kamp
0.72
zman
0.72
ghan
0.71
bugs
0.70
kens
0.69
atari
0.69
Activations Density 0.083%