INDEX
Explanations
references to duration or the passage of time
New Auto-Interp
Negative Logits
rios
-0.16
adol
-0.16
eyn
-0.15
tran
-0.15
esin
-0.15
ksi
-0.14
><?
-0.14
lod
-0.14
ais
-0.14
wich
-0.14
POSITIVE LOGITS
ago
0.23
stay
0.22
Stay
0.20
stays
0.18
Stay
0.18
ç¶ļ
0.18
Ñıб
0.17
stayed
0.17
Ago
0.16
stay
0.16
Activations Density 0.065%