INDEX
Explanations
references to actions or intentions involving "to" followed by verbs
New Auto-Interp
Negative Logits
ãģ¤ãģij
-0.18
éĸĭ
-0.16
è¦ĭ
-0.15
soever
-0.15
oppers
-0.14
cq
-0.14
erialize
-0.14
zelf
-0.14
oulder
-0.14
apesh
-0.14
POSITIVE LOGITS
/from
0.32
gether
0.31
plevel
0.23
ying
0.21
tes
0.20
lỼn
0.20
tem
0.20
xic
0.19
ogle
0.19
asting
0.19
Activations Density 1.691%