INDEX
Explanations
references to political and musical themes
New Auto-Interp
Negative Logits
anz
-0.18
v
-0.17
oven
-0.17
overl
-0.16
'
-0.16
vester
-0.16
andle
-0.15
ugh
-0.15
fro
-0.15
posled
-0.15
POSITIVE LOGITS
ujÄħ
0.23
ÅĤ
0.23
ów
0.22
jÄħ
0.22
ÅĤa
0.21
iÄĻ
0.21
że
0.21
ÅĽ
0.21
ajÄħ
0.21
ÅĽcie
0.21
Activations Density 0.324%