INDEX
Explanations
mentions of specific groups of people or entities
references to individuals in positions of authority or societal roles
New Auto-Interp
Negative Logits
thia
-0.55
Lago
-0.52
iland
-0.52
utical
-0.50
itored
-0.50
oppable
-0.46
é¾įå
-0.46
ourke
-0.45
ļé
-0.45
chuk
-0.45
POSITIVE LOGITS
iest
0.74
liest
0.70
hest
0.68
portion
0.61
est
0.52
aspect
0.52
tremend
0.51
keyword
0.51
matchup
0.50
icter
0.50
Activations Density 5.113%