INDEX
    Explanations

    news-related information such as breaking news updates, statements from public figures, and detailed accounts of events

    New Auto-Interp
    Negative Logits
    .
    -0.82
    ,
    -0.75
    ;
    -0.74
     are
    -0.72
    /
    -0.72
    ...
    -0.71
    -0.70
    ↵↵
    -0.69
     and
    -0.69
    -0.68
    POSITIVE LOGITS
     Mlle
    1.57
     emphat
    1.56
     vété
    1.55
     dovr
    1.53
     increa
    1.52
     affor
    1.52
     sappi
    1.52
     hentai
    1.51
     milf
    1.51
     unlaw
    1.50
    Act Density 0.614%

    No Known Activations