INDEX
Explanations
personal pronouns reflecting individual or group identity
New Auto-Interp
Negative Logits
expandindo
-1.13
kasarigan
-1.06
متعلقه
-0.88
Hentet
-0.83
featureID
-0.82
InjectAttribute
-0.81
resourceCulture
-0.80
ligiloj
-0.79
afficheront
-0.78
RegressionTest
-0.76
POSITIVE LOGITS
He
0.67
He
0.65
We
0.59
he
0.55
Our
0.55
We
0.51
ोंने
0.50
Our
0.49
he
0.49
She
0.48
Activations Density 0.404%