INDEX
Explanations
time-related phrases
references to temporal phrases and questions regarding timing
New Auto-Interp
Negative Logits
enegger
-0.83
ortment
-0.76
aking
-0.73
hid
-0.68
aughed
-0.68
edIn
-0.68
POR
-0.67
ruption
-0.67
kaya
-0.66
tyard
-0.65
POSITIVE LOGITS
exactly
1.15
soever
1.14
abouts
0.82
someone
0.76
ce
0.75
they
0.73
somebody
0.71
prompted
0.71
else
0.69
quizz
0.68
Activations Density 0.080%