INDEX
Explanations
references to movies and specific film-related terms
New Auto-Interp
Negative Logits
Äįlov
-0.20
zástup
-0.19
ejména
-0.18
vůli
-0.16
ÅĻÃŃz
-0.15
zdrav
-0.15
úÄįin
-0.14
opráv
-0.14
lesbisk
-0.14
bir
-0.14
POSITIVE LOGITS
a
0.34
[z
0.18
a
0.18
nebo
0.17
a
0.17
nam
0.17
na
0.17
ve
0.17
,
0.16
Äįi
0.16
Activations Density 0.010%