INDEX
Explanations
references to personal pronouns and their associated actions in context
New Auto-Interp
Negative Logits
isay
-0.18
lus
-0.16
loo
-0.16
ubbo
-0.15
lexible
-0.15
dyn
-0.15
argon
-0.14
ковод
-0.14
imas
-0.14
rika
-0.14
POSITIVE LOGITS
self
0.22
mình
0.22
self
0.21
selves
0.21
-self
0.20
itself
0.20
self
0.19
:self
0.18
=self
0.18
elves
0.17
Activations Density 0.036%