INDEX
Explanations
references to the Western world or Western culture
New Auto-Interp
Negative Logits
تضيفلها
-0.58
findpost
-0.52
SequentialGroup
-0.50
跳转至
-0.49
cocoa
-0.46
propOrder
-0.45
好
-0.45
ανά
-0.44
szy
-0.44
patches
-0.43
POSITIVE LOGITS
Western
1.26
western
1.23
Western
1.16
WESTERN
1.12
western
1.12
WESTERN
0.97
occidental
0.88
wester
0.86
wester
0.80
occidentale
0.75
Activations Density 0.105%