INDEX
    Explanations

    names of individuals involved in various news and political events

    references to notable individuals and their actions or claims

    New Auto-Interp
    Negative Logits
    .:
    -1.02
    .(
    -0.91
    %.
    -0.90
    +.
    -0.84
    .<
    -0.82
    :(
    -0.81
    *.
    -0.80
    !".
    -0.80
    .
    -0.79
    .–
    -0.79
    POSITIVE LOGITS
     )]
    0.93
    )]
    0.85
     ?)
    0.83
    *)
    0.77
    ?)
    0.76
    })
    0.74
    ')
    0.71
    )\
    0.70
    ?),
    0.69
    )}
    0.69
    Act Density 0.991%

    No Known Activations