INDEX
Explanations
dates in specific formats
periods indicating the end of statements or sentences
New Auto-Interp
Negative Logits
pudding
-0.85
fishes
-0.80
everywhere
-0.77
sauces
-0.76
minion
-0.74
hats
-0.72
necks
-0.71
opponent
-0.71
instinct
-0.70
heav
-0.70
POSITIVE LOGITS
However
1.01
Afterwards
0.99
Since
0.96
Additionally
0.95
According
0.95
Meanwhile
0.93
Previously
0.92
Presumably
0.92
Alternatively
0.90
Similarly
0.89
Activations Density 0.521%