INDEX
Explanations
pronouns and verbs indicating actions or states of being
New Auto-Interp
Negative Logits
anka
-0.15
ocks
-0.15
must
-0.15
doesn
-0.14
chos
-0.14
oner
-0.14
immedi
-0.14
escaping
-0.14
ãĥ¼ãĥª
-0.13
ãĥªãĥ¼ãĤº
-0.13
POSITIVE LOGITS
near
0.31
ne
0.29
prepares
0.28
prepare
0.28
prepared
0.27
near
0.27
gears
0.24
prepared
0.24
await
0.23
gear
0.23
Activations Density 0.169%