INDEX
    Explanations

    references to specific movies and their elements

    New Auto-Interp
    Negative Logits
    steen
    -0.19
    ione
    -0.16
    662
    -0.15
     sake
    -0.15
    bens
    -0.15
    irie
    -0.14
    izz
    -0.14
     deflate
    -0.14
    URY
    -0.14
    ALSE
    -0.14
    POSITIVE LOGITS
     Terminator
    0.32
     TERMIN
    0.32
     terminator
    0.31
     Termin
    0.30
     Schwar
    0.27
    Ter
    0.27
     Arnold
    0.26
    termin
    0.24
     termin
    0.24
     terminated
    0.23
    Act Density 0.008%

    No Known Activations