INDEX
Explanations
expressions related to holding law enforcement accountable
references to right-wing ideologies or groups
New Auto-Interp
Negative Logits
Plum
-0.70
Noir
-0.65
KH
-0.63
Vaughn
-0.62
Guru
-0.61
Solitaire
-0.60
Pell
-0.60
Moj
-0.59
Contest
-0.58
Chill
-0.58
POSITIVE LOGITS
hander
1.28
handed
1.23
leaning
1.22
wing
1.21
angled
1.09
aligned
1.08
sided
1.08
thinking
1.07
hand
1.05
click
1.01
Activations Density 0.027%