INDEX
Explanations
proper nouns, specifically related to political figures
references to specific situations or events
New Auto-Interp
Negative Logits
EEE
-0.69
cffff
-0.68
heon
-0.67
pherd
-0.65
soType
-0.65
axter
-0.64
eln
-0.63
wcsstore
-0.63
ð
-0.62
à¹
-0.62
POSITIVE LOGITS
istan
0.82
aires
0.77
rift
0.71
tab
0.68
igmatic
0.67
off
0.66
rama
0.66
Instit
0.66
zan
0.64
sag
0.63
Activations Density 0.000%