INDEX
Explanations
names of individuals mentioned in news articles
New Auto-Interp
Negative Logits
indo
-0.63
abase
-0.63
catentry
-0.61
contrace
-0.60
itiveness
-0.59
>[
-0.58
apor
-0.56
prelim
-0.56
atto
-0.56
sexes
-0.55
POSITIVE LOGITS
velt
0.69
eworthy
0.69
¿
0.68
ħĭ
0.67
aiden
0.67
hani
0.65
bery
0.64
Collins
0.63
Celt
0.63
Myster
0.63
Activations Density 0.088%