INDEX
Explanations
phrases indicating time duration or continuity
New Auto-Interp
Negative Logits
uling
-0.16
geois
-0.16
deaux
-0.16
viso
-0.15
inya
-0.15
hiba
-0.14
лÑĥÑĪ
-0.14
resse
-0.14
achine
-0.14
sein
-0.14
POSITIVE LOGITS
before
0.28
before
0.21
Before
0.20
shortly
0.20
first
0.19
they
0.19
forever
0.19
antes
0.19
Before
0.18
BEFORE
0.18
Activations Density 0.039%