INDEX
Explanations
references to a specific character, predominantly focusing on their actions and qualities
New Auto-Interp
Negative Logits
hommes
-0.61
partial
-0.60
getError
-0.59
Drago
-0.57
ín
-0.56
ados
-0.56
Frick
-0.56
uksi
-0.55
options
-0.55
partial
-0.55
POSITIVE LOGITS
she
1.94
She
1.80
she
1.72
She
1.69
SHE
1.48
SHE
1.42
shes
1.40
herself
1.37
但她
1.11
her
1.10
Activations Density 0.052%