INDEX
Explanations
modal verbs indicating potential actions or conditions
New Auto-Interp
Negative Logits
Winds
-0.17
hn
-0.16
istas
-0.16
kers
-0.15
themselves
-0.15
ilib
-0.15
w
-0.15
hor
-0.15
ilde
-0.14
.snap
-0.14
POSITIVE LOGITS
iner
0.26
raining
0.25
chy
0.20
alo
0.19
eless
0.19
edn
0.18
alic
0.17
ty
0.17
chalk
0.17
Lane
0.17
Activations Density 0.158%