INDEX
Explanations
phrases indicating consequences or results
New Auto-Interp
Negative Logits
aat
-0.15
ãĢĤãģĿãģĹãģ¦
-0.13
živ
-0.13
cid
-0.13
иÑĤов
-0.13
jÃŃt
-0.13
ysa
-0.12
lux
-0.12
izedName
-0.12
verbatim
-0.12
POSITIVE LOGITS
activity
0.23
sorts
0.20
possibilities
0.19
possibility
0.18
events
0.18
Activity
0.18
activity
0.16
Activity
0.16
ivery
0.16
ideas
0.16
Activations Density 0.186%