INDEX
Explanations
references to causality and violation of rules or expectations
Follows prepositions or punctuation
pronouns and names
New Auto-Interp
Negative Logits
يتيمه
-1.03
}$
-0.84
Normdatei
-0.82
تانيه
-0.79
estekak
-0.72
tfsi
-0.72
tserrat
-0.71
manjaro
-0.70
cdti
-0.68
êques
-0.68
POSITIVE LOGITS
her
1.31
she
1.15
his
1.13
him
1.01
he
0.99
their
0.92
they
0.83
She
0.83
He
0.81
she
0.79
Activations Density 3.943%