INDEX
    Explanations

    references to movies and specific film-related terms

    New Auto-Interp
    Negative Logits
     Äįlov
    -0.20
     zástup
    -0.19
    ejména
    -0.18
    vůli
    -0.16
    ÅĻÃŃz
    -0.15
     zdrav
    -0.15
     úÄįin
    -0.14
     opráv
    -0.14
     lesbisk
    -0.14
     bir
    -0.14
    POSITIVE LOGITS
     a
    0.34
    [z
    0.18
    	a
    0.18
     nebo
    0.17
    a
    0.17
     nam
    0.17
     na
    0.17
     ve
    0.17
    ,
    0.16
     Äįi
    0.16
    Act Density 0.010%

    No Known Activations