INDEX
Explanations
words related to qualities or characteristics
words related to emotional states or conditions
New Auto-Interp
Negative Logits
ãĥĥãĥī
-0.65
bern
-0.64
ODE
-0.63
ellar
-0.62
verbs
-0.61
Auschwitz
-0.61
roit
-0.60
ANN
-0.59
OY
-0.59
amen
-0.59
POSITIVE LOGITS
iness
1.16
terness
0.99
ness
0.97
nesses
0.94
ionage
0.81
Flavoring
0.79
yy
0.77
liness
0.77
hip
0.75
osuke
0.75
Activations Density 0.028%