INDEX
    Explanations

    references to historical figures or events in cinema

    New Auto-Interp
    Negative Logits
     ragaz
    -0.17
     eskort
    -0.16
    äd
    -0.15
     misunder
    -0.15
    lun
    -0.15
    olum
    -0.15
     Beit
    -0.14
    äºľ
    -0.14
    mür
    -0.14
    loh
    -0.14
    POSITIVE LOGITS
     van
    0.19
     overs
    0.19
     ste
    0.18
    .nl
    0.17
     lij
    0.17
     overd
    0.16
    igh
    0.16
     af
    0.16
    ieu
    0.16
     h
    0.15
    Act Density 0.177%

    No Known Activations