INDEX
    Explanations

    references to social or political manipulation and control

    New Auto-Interp
    Negative Logits
    principalTable
    -0.77
     ModelExpression
    -0.70
    ImageContext
    -0.68
     تضيفلها
    -0.65
    mergeFrom
    -0.64
     GenerationType
    -0.62
    GEBURTSDATUM
    -0.62
    .",
    
    -0.61
    SequentialGroup
    -0.61
     useDispatch
    -0.60
    POSITIVE LOGITS
     supposedly
    0.53
     mierda
    0.52
     ostensibly
    0.52
     Profitez
    0.51
     sanitaires
    0.48
    0.47
     mentale
    0.47
     supuestamente
    0.47
     Bilder
    0.47
     Bruxelles
    0.47
    Act Density 0.787%

    No Known Activations