INDEX
Explanations
negative connotations or mentions of decline
New Auto-Interp
Negative Logits
377
-0.15
nore
-0.14
360
-0.14
Strauss
-0.14
urette
-0.14
865
-0.14
533
-0.14
иж
-0.14
licensors
-0.14
tar
-0.14
POSITIVE LOGITS
.sponge
0.17
ufen
0.16
(#)
0.16
ë©ĺ
0.15
ingen
0.14
jen
0.14
cred
0.14
ewater
0.14
iment
0.14
ç»ĵ
0.14
Activations Density 0.018%