INDEX
Explanations
references to citations and academic validation
New Auto-Interp
Negative Logits
its
-1.05
Its
-0.91
яке
-0.90
Its
-0.89
Оно
-0.82
its
-0.81
которое
-0.81
它
-0.78
它的
-0.78
it
-0.72
POSITIVE LOGITS
celles
1.25
herself
1.20
ones
1.12
lesquelles
1.08
herself
1.00
ellas
0.99
Elles
0.96
she
0.92
éstas
0.90
Elles
0.88
Activations Density 0.035%