INDEX
Explanations
pronouns that refer to groups of people, particularly in relation to actions or sentiments
New Auto-Interp
Negative Logits
was
-0.22
(s
-0.18
(es
-0.18
Was
-0.17
itself
-0.16
Was
-0.15
amp
-0.15
باشد
-0.14
nbsp
-0.14
[s
-0.13
POSITIVE LOGITS
’re
0.69
're
0.63
’ve
0.52
are
0.52
've
0.51
’ll
0.41
'll
0.39
aren
0.39
’d
0.35
were
0.35
Activations Density 0.822%