INDEX
Explanations
comparisons or similes in the content
New Auto-Interp
Negative Logits
ſtate
-1.29
purpoſe
-1.25
Efq
-1.17
ſever
-1.16
ſelves
-1.16
perſon
-1.15
myſelf
-1.14
raiſ
-1.14
itſelf
-1.14
houſe
-1.13
POSITIVE LOGITS
'
0.67
!
0.67
0.66
.
0.66
0.63
’
0.62
/
0.62
I
0.60
,
0.58
a
0.58
Activations Density 0.067%