INDEX
Explanations
words and phrases associated with brightness and positivity
New Auto-Interp
Negative Logits
hattan
-0.18
hiro
-0.17
ÑģÑı
-0.16
nd
-0.16
ather
-0.15
oooo
-0.15
hort
-0.15
ге
-0.15
BLE
-0.14
isy
-0.14
POSITIVE LOGITS
ening
0.40
ened
0.36
eners
0.28
-eyed
0.26
ens
0.25
ener
0.23
en
0.23
enment
0.22
eyed
0.21
enin
0.21
Activations Density 0.028%