INDEX
Explanations
phrases that describe processes and actions related to involvement or engagement in activities
New Auto-Interp
Negative Logits
assin
-0.19
же
-0.15
ian
-0.15
svůj
-0.15
оди
-0.15
umer
-0.14
anium
-0.14
å®¶çļĦ
-0.14
ymb
-0.13
redient
-0.13
POSITIVE LOGITS
only
0.19
both
0.19
Ñģобой
0.18
:
0.18
elements
0.17
besides
0.17
lots
0.17
mainly
0.17
fewer
0.17
mostly
0.16
Activations Density 0.198%