INDEX
Explanations
references to original works or concepts
New Auto-Interp
Negative Logits
zast
-0.14
izard
-0.14
ic
-0.14
554
-0.14
lish
-0.14
akh
-0.14
vÄĽ
-0.14
ands
-0.14
allo
-0.13
toi
-0.13
POSITIVE LOGITS
/original
0.25
original
0.19
Original
0.18
ORIGINAL
0.17
-original
0.16
ajo
0.16
etty
0.16
original
0.15
erotico
0.15
(original
0.15
Activations Density 0.141%