INDEX
Explanations
instances of governmental or political power dynamics related to entities and actions
New Auto-Interp
Negative Logits
&=&
-0.74
fallu
-0.65
&=&\
-0.62
&=&
-0.60
她們
-0.57
करती
-0.54
которое
-0.53
Оно
-0.52
它們
-0.52
它们
-0.51
POSITIVE LOGITS
his
3.78
he
3.11
him
3.05
his
2.88
himself
2.81
彼は
2.77
彼の
2.72
彼が
2.60
himself
2.30
他的
2.30
Activations Density 5.268%