INDEX
Explanations
words related to being satisfied or approved of, as well as words indicating support or backing
words related to the concept of justification
New Auto-Interp
Negative Logits
thur
-0.72
dimension
-0.72
fall
-0.70
Bees
-0.68
kers
-0.65
ACH
-0.65
Archdemon
-0.64
FTWARE
-0.64
ker
-0.64
alter
-0.63
POSITIVE LOGITS
ified
1.15
ification
0.92
ifies
0.87
ify
0.86
ourgeois
0.84
urally
0.81
IFIED
0.81
ifix
0.79
ibaba
0.79
ific
0.78
Activations Density 0.013%