INDEX
Explanations
words related to degrading or destructive actions
instances of the prefix "des" and associated terms
New Auto-Interp
Negative Logits
tipped
-0.74
Holmes
-0.72
Hastings
-0.69
OWS
-0.66
ancial
-0.64
STER
-0.64
caregivers
-0.63
Palmer
-0.61
glers
-0.61
DAY
-0.60
POSITIVE LOGITS
ignt
1.03
plet
1.02
icc
1.01
ktop
1.00
ugar
0.99
ync
0.98
ription
0.95
ynchron
0.93
aturated
0.92
pell
0.92
Activations Density 0.011%