INDEX
    Explanations

    texts related to respect and social norms

    words related to perspective or viewpoints

    New Auto-Interp
    Negative Logits
    pmwiki
    -0.73
    cedes
    -0.70
    udeb
    -0.69
    educ
    -0.68
    puting
    -0.66
    fml
    -0.65
    rama
    -0.64
    anyl
    -0.64
    driving
    -0.63
    izzard
    -0.63
    POSITIVE LOGITS
    pect
    1.09
     Ratio
    0.94
    orate
    0.94
    terness
    0.93
    rons
    0.88
    pects
    0.88
    lihood
    0.85
    olate
    0.81
    eur
    0.79
    anamo
    0.78
    Act Density 0.006%

    No Known Activations