INDEX
Explanations
proper nouns related to individuals, possibly in a news or political context
New Auto-Interp
Negative Logits
terday
-0.74
EED
-0.72
conclud
-0.71
eele
-0.64
anwhile
-0.63
Reference
-0.61
Sax
-0.59
ENTS
-0.58
henko
-0.58
ateral
-0.57
POSITIVE LOGITS
rique
1.19
ning
1.05
riks
0.95
rik
0.94
sel
0.93
lein
0.92
agar
0.92
nery
0.90
ricks
0.88
ners
0.88
Activations Density 6.595%