INDEX
Explanations
terms related to political ideologies and extremism
references to extremist political groups
New Auto-Interp
Negative Logits
Cola
-0.76
Doodle
-0.73
Anonymous
-0.70
..............
-0.70
||
-0.69
natureconservancy
-0.69
ACTIONS
-0.67
ADRA
-0.67
BB
-0.65
Moons
-0.63
POSITIVE LOGITS
ishly
0.86
reaching
0.81
ibaba
0.79
coe
0.77
ctic
0.76
distances
0.74
rency
0.73
fetched
0.72
agher
0.68
inent
0.68
Activations Density 0.118%