INDEX
    Explanations

    references to social dynamics and interactions, particularly in contexts of power and consumerism

    New Auto-Interp
    Negative Logits
    réhen
    -0.66
     snippetHide
    -0.65
    ]-->
    -0.65
     hiszen
    -0.62
    />";
    -0.62
    ]]
    
    -0.62
     }],
    -0.62
    mişti
    -0.61
    gewöhn
    -0.57
     vieles
    -0.57
    POSITIVE LOGITS
     fucking
    0.99
     FUCKING
    0.90
     goddamn
    0.88
     ͡°
    0.88
    fucking
    0.87
     fuckin
    0.83
     fuck
    0.81
     motherfucker
    0.79
    0.78
     Fucking
    0.78
    Act Density 1.077%

    No Known Activations