INDEX
Explanations
references to types of wine
New Auto-Interp
Negative Logits
loit
-0.16
ãng
-0.15
elu
-0.15
iÄĻ
-0.14
national
-0.14
oten
-0.14
.zh
-0.14
lobs
-0.14
iges
-0.14
censor
-0.14
POSITIVE LOGITS
Gone
0.17
Alive
0.15
aroo
0.15
éĦī
0.14
eras
0.14
opak
0.14
spaced
0.14
rip
0.13
ìŀ¡
0.13
ripp
0.13
Activations Density 0.003%