INDEX
Explanations
references to the band Led Zeppelin
New Auto-Interp
Negative Logits
kus
-0.19
pps
-0.16
imler
-0.16
mpp
-0.15
wares
-0.15
inders
-0.15
iale
-0.15
leading
-0.15
Hust
-0.15
fik
-0.15
POSITIVE LOGITS
gers
0.27
better
0.25
Ze
0.21
gend
0.20
astr
0.19
roit
0.19
ges
0.19
ger
0.18
GER
0.17
oux
0.17
Activations Density 0.010%