INDEX
    Explanations

    phrases related to manipulation and deception

    New Auto-Interp
    Negative Logits
     rele
    -0.43
    bares
    -0.40
    TaskId
    -0.39
    -0.39
    OrderService
    -0.37
    UAGES
    -0.37
     gangs
    -0.36
     Hands
    -0.36
     Bombs
    -0.36
     الحره
    -0.35
    POSITIVE LOGITS
     believing
    0.64
     fooled
    0.60
    SharedCtor
    0.58
    invokeLater
    0.57
     croy
    0.56
     deceived
    0.56
    ImageContext
    0.55
     geloof
    0.54
    tanleria
    0.53
     croire
    0.52
    Act Density 0.097%

    No Known Activations