INDEX
    Explanations

    references to names or specific nouns

    mentions of individuals and episodes from certain series or events

    New Auto-Interp
    Negative Logits
    oat
    -0.93
    bing
    -0.85
    ret
    -0.77
    oths
    -0.75
    reth
    -0.72
     Telegram
    -0.68
    bor
    -0.67
    rosse
    -0.67
    gest
    -0.66
    bage
    -0.64
    POSITIVE LOGITS
    elson
    0.79
    letal
    0.79
    umat
    0.77
    opoulos
    0.77
    ewski
    0.75
    cart
    0.74
    ansen
    0.74
    umatic
    0.73
    emic
    0.73
    aimon
    0.73
    Act Density 0.045%

    No Known Activations