INDEX
Explanations
words indicating the initiation or commencement of actions or events
New Auto-Interp
Negative Logits
overcome
-0.17
quote
-0.14
hy
-0.14
stone
-0.14
thr
-0.14
sexual
-0.14
uck
-0.13
loader
-0.13
stones
-0.13
OLLOW
-0.13
POSITIVE LOGITS
happening
0.24
zia
0.18
appearing
0.16
its
0.16
avit
0.15
appe
0.15
oad
0.15
ấm
0.15
occurring
0.15
¶Į
0.14
Activations Density 0.051%