INDEX
Explanations
mentions of support for various causes or individuals
references to political or social backing
New Auto-Interp
Negative Logits
sweat
-0.67
Rim
-0.64
ModLoader
-0.62
Ir
-0.60
Reeves
-0.59
selves
-0.58
unbeliev
-0.58
Hebdo
-0.58
neys
-0.58
burg
-0.58
POSITIVE LOGITS
itism
0.97
Support
0.82
arity
0.80
ability
0.79
Supports
0.74
leader
0.74
ãĥĨ
0.73
asio
0.72
byn
0.72
ative
0.71
Activations Density 0.052%