INDEX
Explanations
mentions of specific names, particularly related to sports or entertainment
New Auto-Interp
Negative Logits
atari
-0.81
Purg
-0.76
orld
-0.74
Archdemon
-0.73
aviour
-0.71
Nadu
-0.70
ulhu
-0.70
haunt
-0.65
imum
-0.63
ilight
-0.60
POSITIVE LOGITS
enhagen
1.20
Gaal
0.68
ner
0.66
lund
0.65
Dos
0.65
esson
0.64
Sawyer
0.59
veland
0.59
Surge
0.57
impression
0.57
Activations Density 0.137%