INDEX
Explanations
positive attributes or characteristics attributed to individuals
New Auto-Interp
Negative Logits
olas
-0.77
phabet
-0.77
dict
-0.69
ees
-0.64
isson
-0.64
Pru
-0.60
Jah
-0.59
Papers
-0.59
Graves
-0.58
eh
-0.58
POSITIVE LOGITS
THING
1.60
conceivable
1.41
imaginable
1.31
where
1.21
single
1.07
single
1.04
inch
1.03
WHERE
0.98
ounce
0.97
thin
0.96
Activations Density 0.552%