INDEX
Explanations
references to cultural practices and traditions, particularly those involving dress codes
New Auto-Interp
Negative Logits
CORD
-0.17
amage
-0.14
Jer
-0.14
auen
-0.14
776
-0.14
ายà¸Ļ
-0.14
adius
-0.13
cord
-0.13
Traverse
-0.13
ÛĮرÙĩ
-0.13
POSITIVE LOGITS
hij
0.46
Hij
0.41
veil
0.36
bur
0.34
ve
0.32
ni
0.29
Ve
0.28
covering
0.28
modest
0.28
covering
0.27
Activations Density 0.049%