INDEX
Explanations
sentences that end in a period
New Auto-Interp
Negative Logits
yip
-0.83
averages
-0.78
extinct
-0.73
thal
-0.72
soDeliveryDate
-0.70
dips
-0.69
selfie
-0.68
eatures
-0.68
shark
-0.67
bearer
-0.67
POSITIVE LOGITS
Eventually
1.48
Suddenly
1.47
Soon
1.37
Then
1.30
Unable
1.25
Slowly
1.25
Instead
1.25
Fortunately
1.23
Within
1.21
Ultimately
1.21
Activations Density 0.397%