INDEX
    Explanations

    words related to political events and actions

    New Auto-Interp
    Negative Logits
    yip
    -0.84
    jri
    -0.68
    eatures
    -0.67
    apest
    -0.66
    hops
    -0.65
    appropriate
    -0.65
    ancial
    -0.64
    ittens
    -0.63
     incon
    -0.62
    é¾įå
    -0.61
    POSITIVE LOGITS
    ciating
    1.05
    ĸļ
    0.83
    achment
    0.83
    enment
    0.82
    ãĤ¨ãĥ«
    0.78
    wikipedia
    0.77
    yll
    0.71
    emy
    0.67
    emies
    0.62
    ãĥĢ
    0.61
    Act Density 0.034%

    No Known Activations