INDEX
Explanations
dates written in the format of month followed by year
end punctuation in the text
New Auto-Interp
Negative Logits
purse
-0.80
pudding
-0.77
fishes
-0.74
everywhere
-0.73
jersey
-0.71
rall
-0.71
hats
-0.71
naughty
-0.67
elusive
-0.67
disemb
-0.67
POSITIVE LOGITS
However
1.03
Since
1.01
That
1.01
Afterwards
1.00
Additionally
0.93
Needless
0.93
Then
0.93
During
0.93
Previously
0.91
According
0.91
Activations Density 0.547%