INDEX
Explanations
instances of words or phrases indicating action or movement
New Auto-Interp
Negative Logits
prostor
-0.15
agrams
-0.15
lanma
-0.15
-lnd
-0.14
elts
-0.14
imeType
-0.14
quire
-0.14
enek
-0.14
BuilderFactory
-0.14
inds
-0.14
POSITIVE LOGITS
idable
0.25
able
0.25
ulous
0.24
izable
0.24
inous
0.23
ous
0.23
tractive
0.23
idious
0.22
eworthy
0.22
urious
0.22
Activations Density 0.003%