INDEX
Explanations
instances of "first" and related phrases that indicate initial thoughts or experiences
New Auto-Interp
Negative Logits
currently
-0.17
finally
-0.16
alli
-0.16
å·²
-0.15
illard
-0.15
elia
-0.14
ultimately
-0.14
缮åīį
-0.14
sonst
-0.14
now
-0.14
POSITIVE LOGITS
initially
0.29
Initially
0.25
Initially
0.24
inicial
0.20
initial
0.18
initial
0.18
ulo
0.16
639
0.16
(initial
0.16
scept
0.16
Activations Density 0.085%