INDEX
    Explanations

    words associated with social critique and moral accountability

    New Auto-Interp
    Negative Logits
    ittel
    -0.15
    Enumerator
    -0.15
    veis
    -0.14
    éľ
    -0.14
    -Headers
    -0.14
    è¾°
    -0.14
    rou
    -0.14
    orrect
    -0.13
    ollider
    -0.13
    persona
    -0.13
    POSITIVE LOGITS
     Perc
    0.15
     nackte
    0.14
     :::
    0.14
    447
    0.14
     Pare
    0.14
    uve
    0.14
     Kling
    0.14
    gew
    0.14
     Rudd
    0.13
     Middle
    0.13
    Act Density 0.333%

    No Known Activations