INDEX
Explanations
words related to physical actions or movements
New Auto-Interp
Negative Logits
============
-0.74
uate
-0.70
inates
-0.65
preferential
-0.64
======
-0.64
senal
-0.64
âĸ¬âĸ¬
-0.63
sender
-0.61
negligent
-0.61
resting
-0.60
POSITIVE LOGITS
imming
1.24
indle
1.22
itched
1.16
addle
1.07
ifty
1.07
inging
1.07
itching
1.03
ollen
1.03
immers
1.03
agger
1.02
Activations Density 0.380%