INDEX
    Explanations

    names of actors or entertainment industry professionals

    phrases that introduce examples or lists

    New Auto-Interp
    Negative Logits
    gans
    -0.85
    istical
    -0.82
    oric
    -0.72
    reatment
    -0.72
    orship
    -0.72
    enser
    -0.71
    essing
    -0.70
    rison
    -0.70
    idate
    -0.69
    ivities
    -0.69
    POSITIVE LOGITS
     Alfred
    0.75
     Cowboy
    0.73
     Jasper
    0.73
     Esper
    0.73
     Beautiful
    0.73
     Brig
    0.72
     Exodus
    0.72
     Bald
    0.72
     Martha
    0.71
     Jeremiah
    0.71
    Act Density 0.153%

    No Known Activations