INDEX
Explanations
past tense verbs indicating actions or states
New Auto-Interp
Negative Logits
ford
-0.15
æĥij
-0.15
ead
-0.15
anned
-0.14
çļĦæĺ¯
-0.14
ally
-0.14
rosso
-0.14
ilda
-0.14
adel
-0.13
ically
-0.13
POSITIVE LOGITS
/is
0.35
nt
0.31
htub
0.18
nts
0.17
NT
0.17
/w
0.16
zik
0.16
indeed
0.15
ìĬ´
0.15
ps
0.15
Activations Density 0.442%