INDEX
Explanations
mentions of specific individuals, primarily actors or characters in entertainment contexts
New Auto-Interp
Negative Logits
choice
-0.80
ERAL
-0.77
areth
-0.74
ulate
-0.74
ulating
-0.73
ablishment
-0.72
emonic
-0.72
owship
-0.72
ulates
-0.71
erman
-0.70
POSITIVE LOGITS
nton
0.87
Bros
0.67
Russo
0.66
icum
0.66
eru
0.66
Thornton
0.64
Weasley
0.64
sidx
0.64
Smy
0.63
bably
0.63
Activations Density 0.007%