INDEX
Explanations
timestamps in a specific format
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
unintended
-0.68
tsun
-0.67
execut
-0.65
landscape
-0.64
portraits
-0.63
behavi
-0.62
handwriting
-0.62
persecuted
-0.62
surviv
-0.62
ecology
-0.62
POSITIVE LOGITS
Downloadha
0.88
0.88
css
0.79
0.79
Unloaded
0.78
Accessed
0.76
Org
0.75
xxx
0.74
org
0.74
ItemTracker
0.73
Activations Density 0.050%