INDEX
    Explanations

    positive adjectives and descriptors expressing approval or admiration

    New Auto-Interp
    Negative Logits
    ngth
    -0.93
    rive
    -0.81
    ividual
    -0.79
    lished
    -0.70
    opez
    -0.66
    alks
    -0.66
    perty
    -0.65
    usalem
    -0.65
    cipl
    -0.64
    assemb
    -0.64
    POSITIVE LOGITS
    soType
    0.81
     enough
    0.80
     considering
    0.79
     ðŁĻĤ
    0.74
     explan
    0.73
    ECA
    0.72
    LY
    0.71
     NEWS
    0.70
     ðŁĺ
    0.69
     XD
    0.68
    Act Density 0.101%

    No Known Activations