INDEX
Explanations
expressions of reluctance or unwillingness
New Auto-Interp
Negative Logits
itſelf
-1.08
pleaſure
-1.07
purpoſe
-1.05
Efq
-1.02
ſta
-1.01
myſelf
-1.01
houſe
-1.00
poffe
-0.96
ſtate
-0.94
Jefus
-0.94
POSITIVE LOGITS
ever
0.60
or
0.57
/
0.55
risk
0.54
yet
0.52
me
0.52
,
0.50
non
0.49
any
0.49
re
0.49
Activations Density 0.267%