INDEX
    Explanations

    phrases related to political discussions

    phrases indicating a sense of negativity or hardship

    New Auto-Interp
    Negative Logits
     metic
    -0.68
     Vector
    -0.59
    omorphic
    -0.58
     oun
    -0.57
     Owl
    -0.56
     mosqu
    -0.56
     citiz
    -0.56
     blanket
    -0.54
    ected
    -0.54
     Frontier
    -0.54
    POSITIVE LOGITS
    s
    1.85
    ses
    1.41
    sb
    1.20
    sat
    1.08
    sg
    1.04
    sets
    1.04
    si
    1.04
    itates
    1.03
    ski
    1.02
    ends
    1.02
    Act Density 0.160%

    No Known Activations