INDEX
Explanations
phrases related to support and standing up for others or causes
New Auto-Interp
Negative Logits
strar
-0.17
osate
-0.16
endale
-0.16
edad
-0.16
ceptive
-0.16
icus
-0.15
going
-0.15
dept
-0.15
sein
-0.14
endon
-0.14
POSITIVE LOGITS
-alone
0.23
rase
0.16
ings
0.15
Charsets
0.15
ARDS
0.14
rev
0.14
496
0.14
-valu
0.14
بÛĮر
0.13
CLUDING
0.13
Activations Density 0.046%