INDEX
    Explanations

    references to people in positions of power or authority

    New Auto-Interp
    Negative Logits
    LEncoder
    -0.68
     مشين
    -0.68
    DockStyle
    -0.62
    principalTable
    -0.62
     ujednoznacz
    -0.58
     Parkway
    -0.54
    \{\\
    -0.54
    '}),
    -0.54
    zzard
    -0.53
    ()]
    
    -0.52
    POSITIVE LOGITS
     entourage
    0.75
     bodyguard
    0.60
     aides
    0.55
    ARGB
    0.52
     followers
    0.52
    hilt
    0.50
    orylation
    0.49
     personales
    0.49
     flatter
    0.48
     pessoais
    0.48
    Act Density 0.329%

    No Known Activations