INDEX
Explanations
phrases indicating the timing of events
phrases indicating events or actions that are occurring or reporting on developments
New Auto-Interp
Negative Logits
dden
-0.83
guided
-0.83
ee
-0.77
urse
-0.76
oller
-0.75
rendered
-0.75
ignty
-0.74
raid
-0.74
olor
-0.74
oru
-0.73
POSITIVE LOGITS
undone
0.96
alive
0.76
amid
0.73
forward
0.72
backs
0.71
Forth
0.70
ashore
0.69
amidst
0.69
back
0.68
testament
0.68
Activations Density 0.050%