INDEX
    Explanations

    movie titles, specifically focusing on the presence of the word "starring"

    references to actors and their roles in films

    New Auto-Interp
    Negative Logits
    alde
    -0.70
     EAR
    -0.68
    upon
    -0.63
     individual
    -0.62
    oat
    -0.62
    IB
    -0.61
     instr
    -0.61
    abis
    -0.60
    XT
    -0.60
     individuals
    -0.59
    POSITIVE LOGITS
     starring
    3.74
     starred
    1.65
     featuring
    1.52
     cameo
    1.13
     stars
    1.09
     Featuring
    1.09
     showcasing
    1.07
     portraying
    1.05
     depicting
    1.04
    stars
    1.04
    Act Density 0.014%

    No Known Activations