INDEX
Explanations
words related to intensity or strength
words related to independence
New Auto-Interp
Negative Logits
MQ
-0.72
veyard
-0.68
Ĥİ
-0.65
--+
-0.65
Gate
-0.64
BY
-0.62
calling
-0.61
culosis
-0.61
sonian
-0.61
Lumpur
-0.60
POSITIVE LOGITS
ented
1.07
etermin
1.01
oled
0.98
irection
0.89
ents
0.89
iour
0.85
etr
0.84
ignant
0.83
inged
0.83
rawn
0.83
Activations Density 0.031%