INDEX
Explanations
phrases that indicate societal rules or norms
New Auto-Interp
Negative Logits
الحياه
-0.75
Franks
-0.68
riezmann
-0.65
fVar
-0.63
XmlIgnore
-0.63
Irma
-0.60
المعيارى
-0.59
Bobo
-0.58
UrlResolution
-0.58
Brody
-0.58
POSITIVE LOGITS
although
0.76
although
0.73
awtextra
0.67
however
0.63
however
0.60
obwohl
0.59
ولك
0.59
€“
0.59
bulunabilir
0.59
ніципалі
0.57
Activations Density 0.082%