INDEX
Explanations
phrases indicating exclusivity or restricted access
New Auto-Interp
Negative Logits
rey
-0.19
ucken
-0.16
orous
-0.15
ino
-0.15
ún
-0.15
esch
-0.15
reek
-0.15
ìĦľ
-0.15
iÅŁte
-0.14
ug
-0.14
POSITIVE LOGITS
ively
0.26
exclusive
0.24
exclusive
0.23
exclus
0.23
/original
0.21
ities
0.20
Exclusive
0.20
Exclusive
0.19
-use
0.19
-exclusive
0.19
Activations Density 0.018%