INDEX
Explanations
references to thresholds in various contexts
New Auto-Interp
Negative Logits
―――――
-1.04
Efq
-1.04
Monfieur
-1.02
ſever
-1.01
ſtate
-1.01
purpoſe
-1.00
myſelf
-0.99
Jefus
-0.98
pleaſure
-0.98
Diſ
-0.98
POSITIVE LOGITS
A
0.75
no
0.73
se
0.72
X
0.71
brief
0.69
0.69
L
0.69
Se
0.68
n
0.66
V
0.66
Activations Density 0.161%