INDEX
Explanations
possessive pronouns and related attributes
New Auto-Interp
Negative Logits
uzzi
-0.15
etur
-0.15
aal
-0.15
ilst
-0.14
ÎľÎŃ
-0.14
suff
-0.14
uil
-0.14
ños
-0.14
ulfilled
-0.14
intree
-0.14
POSITIVE LOGITS
gonna
0.26
been
0.22
not
0.20
afraid
0.19
gone
0.18
gon
0.18
'e
0.18
going
0.17
done
0.17
’e
0.16
Activations Density 0.097%