INDEX
Explanations
first person pronouns and associated verbs expressing personal actions
first-person and second-person pronouns
New Auto-Interp
Negative Logits
quartered
-0.74
rehend
-0.74
imum
-0.71
aign
-0.71
20439
-0.68
cephal
-0.66
angan
-0.62
handled
-0.62
Virtue
-0.61
ignt
-0.61
POSITIVE LOGITS
'll
0.79
iral
0.74
forgot
0.73
kidding
0.73
're
0.73
've
0.72
ain
0.71
vom
0.69
Didn
0.68
Naw
0.68
Activations Density 0.306%