INDEX
Explanations
words that describe negative or harmful attributes
New Auto-Interp
Negative Logits
pleaſure
-1.23
ſta
-1.13
houſe
-1.12
Majefty
-1.11
Efq
-1.11
lyre
-1.10
fermés
-1.07
stateProvider
-1.05
ſtre
-1.04
définiti
-1.04
POSITIVE LOGITS
ness
1.30
ous
1.09
IOUS
1.01
ious
0.90
acious
0.84
EROUS
0.84
icious
0.82
rious
0.80
dious
0.77
s
0.77
Activations Density 0.072%