INDEX
Explanations
dates or time-related information
phrases related to consequences or judgments
New Auto-Interp
Negative Logits
doesnt
-0.65
didnt
-0.61
dont
-0.59
DragonMagazine
-0.59
whats
-0.53
alot
-0.52
secondly
-0.52
thous
-0.51
tyr
-0.51
yss
-0.50
POSITIVE LOGITS
*.
1.17
.*
1.11
.[
1.10
.''.
1.09
.</
1.05
.''
1.01
.).
0.99
.
0.98
.ãĢį
0.98
.�
0.94
Activations Density 2.329%