INDEX
Explanations
dates in the format "Month Day" or just the month
occurrences of the word "Mon" in various contexts
New Auto-Interp
Negative Logits
Norn
-0.89
ACTED
-0.78
NRS
-0.68
rating
-0.67
LESS
-0.67
REE
-0.67
ENG
-0.65
avorite
-0.65
lessly
-0.65
atory
-0.64
POSITIVE LOGITS
roe
1.07
ths
1.03
strous
0.96
ument
0.95
etheless
0.90
STER
0.83
cipled
0.83
itored
0.82
icol
0.82
cture
0.80
Activations Density 0.016%