INDEX
Explanations
infinitive verbs or phrases indicating action or purpose
New Auto-Interp
Negative Logits
nyder
-0.18
å§¿
-0.15
æı
-0.14
anten
-0.14
him
-0.14
è͵
-0.13
Ampl
-0.13
ÙĦÙģ
-0.13
оÑĢож
-0.13
ampl
-0.13
POSITIVE LOGITS
help
0.25
enable
0.23
allow
0.22
better
0.21
hopefully
0.20
which
0.19
suit
0.18
/from
0.18
gether
0.18
support
0.18
Activations Density 0.236%