INDEX
Explanations
specific names and proper nouns related to politics, media, and academia
New Auto-Interp
Negative Logits
ãĤ¼ãĤ¦ãĤ¹
-0.70
mble
-0.69
xual
-0.66
ITNESS
-0.56
thumbnail
-0.54
è¦ļéĨĴ
-0.52
conclud
-0.51
ngth
-0.50
nesday
-0.48
vironment
-0.48
POSITIVE LOGITS
unit
0.75
otte
0.67
ued
0.64
bard
0.64
Lag
0.64
otle
0.64
ueless
0.63
ophe
0.60
esian
0.59
het
0.58
Activations Density 12.521%