INDEX
Explanations
themes related to conflict and contradiction within societal norms and ideologies
New Auto-Interp
Negative Logits
este
-0.18
Silver
-0.15
rep
-0.15
word
-0.15
yl
-0.15
intervals
-0.15
Nat
-0.15
iry
-0.14
nat
-0.14
ky
-0.14
POSITIVE LOGITS
asıyla
0.16
/engine
0.16
enheim
0.16
çĤİ
0.16
uppen
0.16
adden
0.15
927
0.15
når
0.15
ìĽ
0.15
apus
0.15
Activations Density 0.237%