INDEX
Explanations
instances of personal pronouns and their associated verbs or actions
New Auto-Interp
Negative Logits
ega
-0.16
ahn
-0.15
kil
-0.15
negatives
-0.14
PLUS
-0.14
cente
-0.14
.plus
-0.14
otts
-0.14
Express
-0.14
//{{-0.13
POSITIVE LOGITS
combe
0.15
ylland
0.15
«
0.14
displ
0.14
ernet
0.14
eter
0.14
*);↵↵
0.14
seau
0.14
atorio
0.13
bias
0.13
Activations Density 0.158%