INDEX
Explanations
phrases indicating uncertainty or lack of knowledge
New Auto-Interp
Negative Logits
pleaſure
-0.70
ergo
-0.69
myſelf
-0.69
Anſ
-0.68
Indoch
-0.67
GetEnumerator
-0.66
Hochspringen
-0.66
leſs
-0.65
Diſ
-0.65
MSD
-0.65
POSITIVE LOGITS
barely
0.85
even
0.85
siquiera
0.78
principalTable
0.71
Even
0.69
EVEN
0.67
InputBorder
0.65
half
0.62
eens
0.61
even
0.60
Activations Density 0.095%