INDEX
Explanations
statements related to significant environmental or historical changes
New Auto-Interp
Negative Logits
她们
-0.24
å®ĥ们
-0.18
Ðĩ
-0.18
ÑĪила
-0.18
yourselves
-0.16
دÙĩÙħ
-0.15
ovalo
-0.14
Ñĩила
-0.14
ÑĪло
-0.14
eles
-0.13
POSITIVE LOGITS
he
1.20
his
1.06
ä»ĸ
0.91
himself
0.91
his
0.86
him
0.82
ä»ĸçļĦ
0.79
ï¼Įä»ĸ
0.77
ä»ĸ
0.74
ãĢĤä»ĸ
0.73
Activations Density 4.544%