INDEX
Explanations
criticisms or concerns regarding decisions or actions
phrases expressing ethical concerns and societal criticisms
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-1.08
ãĤ¨ãĥ«
-0.90
æ©
-0.80
pione
-0.74
gur
-0.73
ãĥĦ
-0.72
exting
-0.69
eal
-0.69
ãĤ¼
-0.68
ãĥ´ãĤ¡
-0.68
POSITIVE LOGITS
we
1.17
[
1.15
..."
1.09
somebody
1.05
anybody
0.98
â̦"
0.98
Defendants
0.97
,"
0.95
there
0.93
Defendant
0.93
Activations Density 0.461%