INDEX
Explanations
words related to enhancement or improvement
phrases indicating a positive contribution or improvement
New Auto-Interp
Negative Logits
avior
-0.82
Advertisement
-0.74
AAAAAAAA
-0.64
aneous
-0.62
cz
-0.61
irus
-0.60
DH
-0.60
unn
-0.60
CDC
-0.60
apologized
-0.59
POSITIVE LOGITS
humankind
0.74
otos
0.73
srf
0.72
accompany
0.67
axy
0.67
tones
0.66
ggles
0.66
compensate
0.66
mankind
0.66
ound
0.65
Activations Density 0.133%