INDEX
Explanations
sentences that end with a period
New Auto-Interp
Negative Logits
consolidation
-0.76
duck
-0.73
transact
-0.72
closet
-0.72
unsuspecting
-0.71
peanuts
-0.71
consequential
-0.69
inactive
-0.69
doses
-0.69
absorption
-0.68
POSITIVE LOGITS
Then
1.01
Asked
0.99
But
0.98
That
0.97
However
0.97
Meaning
0.95
edu
0.94
"...
0.91
Later
0.91
Similarly
0.89
Activations Density 0.153%