INDEX
Explanations
pronouns and their usage in the context of relationships and self-reference
New Auto-Interp
Negative Logits
oren
-0.16
uzzi
-0.16
owan
-0.15
mer
-0.15
reet
-0.14
reve
-0.14
éc
-0.14
ÅĻiv
-0.14
rist
-0.14
ilton
-0.13
POSITIVE LOGITS
enet
0.15
weigh
0.14
ctrine
0.14
isque
0.14
ierten
0.14
tainment
0.14
.bd
0.14
ledged
0.14
Ģë¡ľ
0.14
ideo
0.14
Activations Density 0.032%