INDEX
Explanations
references to music artists and their works
New Auto-Interp
Negative Logits
476
-0.17
Examiner
-0.17
imore
-0.16
ãĥ³ãĥĨãĤ£
-0.15
stral
-0.15
755
-0.15
ernel
-0.14
ahr
-0.14
quo
-0.14
dk
-0.14
POSITIVE LOGITS
Toro
0.20
Neutral
0.20
Baths
0.19
Neutral
0.18
Fucked
0.18
Battles
0.17
Tune
0.17
dum
0.17
Perf
0.17
Strand
0.17
Activations Density 0.022%