INDEX
Explanations
references to newspaper names and titles
references to a particular newspaper or media outlet
New Auto-Interp
Negative Logits
afety
-0.86
ashtra
-0.71
jab
-0.71
azard
-0.71
iard
-0.68
condu
-0.65
enhagen
-0.65
utical
-0.64
tery
-0.64
Topic
-0.63
POSITIVE LOGITS
Savior
0.73
Irish
0.69
Sax
0.69
¥µ
0.69
icket
0.66
Gleaming
0.64
otos
0.64
brightest
0.63
Strip
0.63
Legendary
0.62
Activations Density 0.115%