INDEX
Explanations
themes related to cultural diversity and social issues
New Auto-Interp
Negative Logits
femin
-0.15
atak
-0.14
tvrt
-0.14
ubu
-0.14
emin
-0.14
avin
-0.13
nez
-0.13
orgot
-0.13
iez
-0.13
semi
-0.13
POSITIVE LOGITS
tolerance
0.30
tol
0.25
peace
0.23
tolerant
0.23
olerance
0.23
diversity
0.22
toler
0.21
unity
0.21
peaceful
0.21
Peace
0.20
Activations Density 0.312%