INDEX
Explanations
interrogative phrases or questions
New Auto-Interp
Negative Logits
it
-0.82
It
-0.63
它
-0.57
It
-0.51
He
-0.51
adə
-0.50
it
-0.50
ذلك
-0.50
bankası
-0.47
Everybody
-0.47
POSITIVE LOGITS
we
1.00
they
0.87
you
0.84
AssemblyCulture
0.82
']],
0.78
wij
0.77
wir
0.76
')['
0.73
}';
0.71
THESE
0.71
Activations Density 0.067%