INDEX
    Explanations

    mentions of actors and related terminology

    New Auto-Interp
    Negative Logits
    erable
    -0.18
    seo
    -0.17
    ned
    -0.17
    ader
    -0.16
    erator
    -0.16
    tera
    -0.16
    est
    -0.16
    aret
    -0.16
    etry
    -0.16
    arily
    -0.15
    POSITIVE LOGITS
    uate
    0.20
    -direct
    0.18
    /music
    0.17
    uating
    0.17
    uated
    0.16
    umba
    0.16
    prene
    0.16
    uator
    0.16
    uation
    0.16
    .Actor
    0.16
    Act Density 0.011%

    No Known Activations