INDEX
Explanations
phrases indicating location and accessibility
New Auto-Interp
Negative Logits
“
-0.93
n
-0.88
her
-0.84
en
-0.83
el
-0.82
a
-0.81
‘
-0.81
sa
-0.79
s
-0.78
Pet
-0.78
POSITIVE LOGITS
myſelf
1.49
Monfieur
1.49
itſelf
1.47
Efq
1.43
becauſe
1.43
purpoſe
1.35
himſelf
1.34
Anſ
1.32
themſelves
1.32
auffi
1.30
Activations Density 0.079%