INDEX
Explanations
references to wealth and socioeconomic status
New Auto-Interp
Negative Logits
ugo
-0.18
opy
-0.15
診
-0.14
Scanner
-0.14
ynom
-0.14
umar
-0.14
ayım
-0.14
scrut
-0.14
ullan
-0.13
idar
-0.13
POSITIVE LOGITS
æĸĻ
0.17
dips
0.14
iol
0.14
ework
0.14
iere
0.14
Dip
0.14
egt
0.13
geber
0.13
olo
0.13
434
0.13
Activations Density 0.273%