INDEX
Explanations
pronouns and their connections to actions or relationships
New Auto-Interp
Negative Logits
certainly
-0.18
iek
-0.17
oret
-0.16
avor
-0.15
缸å½ĵ
-0.15
illis
-0.15
mise
-0.14
kin
-0.14
rets
-0.13
ì¼ĵ
-0.13
POSITIVE LOGITS
bother
0.27
chose
0.27
suddenly
0.26
chosen
0.25
such
0.24
choose
0.24
à¤ĩतन
0.23
so
0.23
bothering
0.22
chosen
0.22
Activations Density 0.215%