INDEX
Explanations
specific years and dates
instances of dates and times
New Auto-Interp
Negative Logits
lain
-0.76
ascript
-0.70
spread
-0.68
safety
-0.64
abad
-0.64
arians
-0.63
average
-0.63
abled
-0.61
namese
-0.61
stable
-0.61
POSITIVE LOGITS
fray
0.69
anew
0.69
Ń·
0.69
Gerr
0.67
OIL
0.67
Gutenberg
0.64
earnest
0.64
uberty
0.63
CEPT
0.63
...]
0.62
Activations Density 0.208%