INDEX
Explanations
references to large-scale events or groups, particularly in the context of violence or social issues
New Auto-Interp
Negative Logits
mates
-0.88
mates
-0.88
mate
-0.83
poons
-0.81
men
-0.77
iour
-0.77
yssey
-0.75
ilda
-0.74
poon
-0.73
rious
-0.72
POSITIVE LOGITS
mobilization
0.98
destruction
0.92
adoption
0.86
fabrication
0.86
deforestation
0.83
looting
0.82
doping
0.81
blackout
0.80
dismantling
0.80
disruption
0.79
Activations Density 0.282%