INDEX
Explanations
forms of do and be in questions
New Auto-Interp
Negative Logits
.*
0.38
رو
0.37
是他
0.34
.(*
0.34
iség
0.33
Count
0.33
ندارد
0.33
مة
0.33
Inner
0.33
*.
0.32
POSITIVE LOGITS
it
0.96
they
0.83
we
0.78
you
0.76
he
0.64
this
0.63
?),
0.62
the
0.59
?).
0.59
?)
0.59
Activations Density 0.086%