INDEX
Explanations
instances of the word "when" indicating temporal context
New Auto-Interp
Negative Logits
ynos
-0.16
raç
-0.14
zas
-0.14
iov
-0.14
perience
-0.14
602
-0.14
nw
-0.14
.axis
-0.14
izo
-0.14
ocy
-0.14
POSITIVE LOGITS
finally
0.18
eken
0.15
asked
0.15
compared
0.14
finished
0.14
vre
0.14
次
0.14
azy
0.14
shar
0.14
åĬ
0.13
Activations Density 0.226%