INDEX
    Explanations

    sentences discussing varying perspectives on a specific issue

    New Auto-Interp
    Negative Logits
     impra
    -2.57
     maneu
    -2.52
     increa
    -2.49
     emphat
    -2.46
     affor
    -2.45
     milf
    -2.42
     hairc
    -2.41
     scrat
    -2.41
     suscep
    -2.41
     disagre
    -2.40
    POSITIVE LOGITS
     He
    1.19
     “
    1.17
     "
    1.16
     She
    1.05
     «
    1.05
    1.03
     They
    1.03
    ↵↵
    0.98
     ”
    0.98
     „
    0.97
    Act Density 0.270%

    No Known Activations