INDEX
Explanations
words related to peace, peaceful situations, and peaceful actions
references to peaceful interactions and protests
New Auto-Interp
Negative Logits
MAC
-0.81
attr
-0.79
odor
-0.77
GPU
-0.74
olls
-0.73
paralle
-0.73
asper
-0.72
ANA
-0.72
alach
-0.71
ripp
-0.71
POSITIVE LOGITS
peaceful
1.03
edIn
0.89
peace
0.84
ness
0.81
minded
0.79
peace
0.76
peacefully
0.74
Yemeni
0.74
resolution
0.74
nonviolent
0.73
Activations Density 0.011%