INDEX
    Explanations

    references to specific movies or their notable attributes

    New Auto-Interp
    Negative Logits
    olu
    -0.17
    izzo
    -0.17
    erno
    -0.15
    antino
    -0.15
    ahren
    -0.15
    ello
    -0.15
    oso
    -0.15
    Jvm
    -0.15
    jos
    -0.15
    inus
    -0.14
    POSITIVE LOGITS
     power
    0.17
     horizontal
    0.17
    -power
    0.16
     Horizontal
    0.15
     poder
    0.15
     Power
    0.15
     POWER
    0.15
     scaleY
    0.15
    del
    0.15
     age
    0.15
    Act Density 0.023%

    No Known Activations