INDEX
Explanations
mentions of specific people or organizations in a context related to news or politics
references to political actions and policies
New Auto-Interp
Negative Logits
rian
-0.45
rylic
-0.44
Frag
-0.44
gravity
-0.43
avez
-0.42
lag
-0.41
ividual
-0.40
itamin
-0.39
ursed
-0.39
riel
-0.38
POSITIVE LOGITS
corrid
0.50
entimes
0.49
ĵĺ
0.49
wcs
0.48
thous
0.47
terness
0.45
Gutierrez
0.45
newcom
0.44
©¶æ¥µ
0.43
suspic
0.43
Activations Density 9.358%