INDEX
Explanations
words related to amplifying, increasing, or elevating concepts, especially in contexts of actions or attributes
New Auto-Interp
Negative Logits
er
-0.20
ر
-0.18
RF
-0.18
berman
-0.17
rnd
-0.17
lsen
-0.17
746
-0.16
rand
-0.15
por
-0.15
lad
-0.15
POSITIVE LOGITS
stead
0.21
shire
0.20
ylon
0.19
arts
0.18
site
0.18
y
0.18
ster
0.17
ithe
0.16
elier
0.16
agne
0.15
Activations Density 0.034%