INDEX
Explanations
words related to ferociousness and intensity
New Auto-Interp
Negative Logits
785
-0.19
imest
-0.17
ystore
-0.17
Pant
-0.15
Stones
-0.15
erna
-0.14
.TODO
-0.14
eenth
-0.14
sd
-0.14
ionales
-0.14
POSITIVE LOGITS
ocious
0.30
mentation
0.28
ret
0.26
reira
0.26
rets
0.26
ocity
0.24
mented
0.23
rous
0.23
tility
0.22
intosh
0.22
Activations Density 0.011%