INDEX
Explanations
terms related to legal liability
New Auto-Interp
Negative Logits
(
-0.82
,
-0.78
-
-0.75
in
-0.75
ub
-0.75
od
-0.69
-0.69
-
-0.68
du
-0.68
a
-0.67
POSITIVE LOGITS
myſelf
1.91
purpoſe
1.82
Jefus
1.75
itſelf
1.71
Efq
1.68
Majefty
1.67
pleaſure
1.66
Theſe
1.61
Anſ
1.61
raiſ
1.60
Activations Density 0.070%