INDEX
Explanations
punctuation marks, specifically periods and commas
New Auto-Interp
Negative Logits
purpoſe
-1.00
myſelf
-0.93
themſelves
-0.92
reaſon
-0.92
pleaſure
-0.91
ſtate
-0.91
perfons
-0.90
himſelf
-0.89
uſed
-0.89
itſelf
-0.89
POSITIVE LOGITS
...
0.94
…
0.88
...
0.75
…
0.74
....
0.66
……
0.62
......
0.59
“...
0.58
--
0.58
CloseOperation
0.58
Activations Density 0.188%