INDEX
Explanations
pronouns or words related to personal actions or attributes
pronouns that indicate personal or collective identity
New Auto-Interp
Negative Logits
church
-0.70
bender
-0.65
aston
-0.61
Elim
-0.59
Eleven
-0.58
Nationwide
-0.58
luster
-0.57
whel
-0.57
quartered
-0.56
Dres
-0.55
POSITIVE LOGITS
've
0.94
perceive
0.89
learnt
0.88
're
0.88
ate
0.87
aspire
0.82
learned
0.82
meant
0.81
saw
0.80
accomplished
0.80
Activations Density 0.136%