INDEX
Explanations
words related to concerns or worries
New Auto-Interp
Negative Logits
nice
-0.74
ingers
-0.74
lite
-0.70
ples
-0.61
ophone
-0.61
handy
-0.60
iller
-0.58
unes
-0.58
dating
-0.58
slick
-0.57
POSITIVE LOGITS
warts
1.06
wart
1.04
lessly
0.91
about
0.85
trolling
0.84
ingly
0.82
bells
0.81
ieties
0.80
ABOUT
0.77
regarding
0.76
Activations Density 0.532%