INDEX
    Explanations

    instances of questioning societal norms and behaviors

    New Auto-Interp
    Negative Logits
    agr
    -0.15
    athe
    -0.15
    iker
    -0.15
    ansa
    -0.14
    ÑĨÑİ
    -0.14
    aki
    -0.14
     sing
    -0.14
    phin
    -0.14
     Crunch
    -0.13
    aga
    -0.13
    POSITIVE LOGITS
     fart
    0.16
     dikke
    0.16
    Mob
    0.15
     Vance
    0.15
    oton
    0.15
    Wars
    0.15
     Liberation
    0.14
     beyond
    0.14
    yaw
    0.14
    -append
    0.13
    Act Density 0.000%

    No Known Activations