INDEX
Explanations
expressions of personal or collective emotions and experiences
New Auto-Interp
Negative Logits
ided
-0.20
rava
-0.18
uppy
-0.18
ity
-0.17
enaire
-0.17
linky
-0.16
sWith
-0.16
dy
-0.16
reated
-0.16
ogle
-0.15
POSITIVE LOGITS
lessly
0.30
less
0.24
making
0.21
chal
0.20
ful
0.19
LESS
0.19
fully
0.18
i
0.17
ible
0.17
idon
0.17
Activations Density 0.016%