INDEX
Explanations
references to peace, reconciliation, and non-violence in various contexts
New Auto-Interp
Negative Logits
aille
-0.17
λαν
-0.16
alie
-0.15
몰
-0.14
豪
-0.14
ileged
-0.14
uhn
-0.14
RIX
-0.14
ÃŃnh
-0.14
assy
-0.14
POSITIVE LOGITS
peace
0.76
Peace
0.69
peace
0.66
Peace
0.65
peaceful
0.55
pac
0.48
peacefully
0.44
pac
0.35
Pac
0.34
disarm
0.32
Activations Density 0.217%