INDEX
Explanations
references to humanitarian aid and cultural identity
New Auto-Interp
Negative Logits
haps
-0.16
ocs
-0.15
ouncil
-0.14
дÑĢÑĥгого
-0.14
çļĦä¸Ģ个
-0.14
uhl
-0.14
à¸ĩหมà¸Ķ
-0.13
isas
-0.13
urope
-0.13
chw
-0.13
POSITIVE LOGITS
both
0.56
both
0.53
BOTH
0.46
two
0.45
Both
0.43
Both
0.42
两个
0.41
respectively
0.40
_both
0.40
beiden
0.40
Activations Density 0.501%