INDEX
Explanations
pronouns and their usages in context
New Auto-Interp
Negative Logits
ounder
-0.17
fak
-0.16
ersive
-0.15
Contrib
-0.15
atti
-0.15
ÑĥÑĩа
-0.14
erp
-0.14
ersh
-0.14
757
-0.14
arb
-0.14
POSITIVE LOGITS
interact
0.21
contact
0.21
interacts
0.19
encounter
0.19
deal
0.19
deal
0.19
encountered
0.18
trust
0.18
associated
0.18
bef
0.17
Activations Density 0.138%