INDEX
    Explanations

    mentions of specific names and entities

    capitalized proper nouns, particularly names and titles of entities

    New Auto-Interp
    Negative Logits
     showc
    -0.70
    arching
    -0.69
     cort
    -0.63
     forth
    -0.62
     contrace
    -0.62
     psychiat
    -0.61
     horm
    -0.59
    forth
    -0.58
     Sylv
    -0.57
     embargo
    -0.56
    POSITIVE LOGITS
    zees
    0.83
    ufact
    0.79
    kas
    0.71
    oola
    0.71
    culosis
    0.70
    emouth
    0.69
    åŃIJ
    0.68
     Beasts
    0.67
    rities
    0.66
    gat
    0.66
    Act Density 0.303%

    No Known Activations