INDEX
Explanations
themes related to belief systems and societal divisions
New Auto-Interp
Negative Logits
ạ
-0.13
segue
-0.13
edback
-0.12
zar
-0.12
amac
-0.12
Âł
-0.11
byn
-0.11
unin
-0.11
ane
-0.11
ünüz
-0.11
POSITIVE LOGITS
in
0.65
åľ¨
0.44
în
0.43
ÙģÙĬ
0.39
åľ¨
0.39
à¹ĥà¸Ļ
0.39
ÙģÙī
0.32
در
0.30
ï¼Įåľ¨
0.30
à¹ĥà¸Ļ
0.30
Activations Density 0.322%