INDEX
Explanations
references to historical figures or events in cinema
New Auto-Interp
Negative Logits
ragaz
-0.17
eskort
-0.16
äd
-0.15
misunder
-0.15
lun
-0.15
olum
-0.15
Beit
-0.14
äºľ
-0.14
mür
-0.14
loh
-0.14
POSITIVE LOGITS
van
0.19
overs
0.19
ste
0.18
.nl
0.17
lij
0.17
overd
0.16
igh
0.16
af
0.16
ieu
0.16
h
0.15
Activations Density 0.177%