INDEX
Explanations
adjectives or phrases related to power or strength
New Auto-Interp
Negative Logits
edIn
-0.74
Seym
-0.72
gow
-0.68
assetsadobe
-0.64
Greenwald
-0.63
Lauder
-0.61
unborn
-0.61
OPLE
-0.60
Reloaded
-0.60
Retrieved
-0.60
POSITIVE LOGITS
imposed
1.33
nova
1.31
visor
1.29
visory
1.20
charg
1.16
visors
1.14
charged
1.12
intend
1.09
cedes
1.07
cell
1.07
Activations Density 0.396%