INDEX
Explanations
references to song titles and lyrics from various rock bands
New Auto-Interp
Negative Logits
ppelin
-0.15
mpar
-0.15
Proceed
-0.14
hed
-0.14
tiener
-0.14
Petty
-0.14
REM
-0.14
Deng
-0.14
bidi
-0.13
multiplic
-0.13
POSITIVE LOGITS
erial
0.16
efa
0.15
ugins
0.15
-exclusive
0.14
phin
0.14
Miz
0.14
بÙĪØ±
0.14
acic
0.14
Borg
0.14
UserCode
0.14
Activations Density 0.020%