INDEX
    Explanations

    instances of deception and pretense in actions or identities

    New Auto-Interp
    Negative Logits
    ]));
    
    -0.79
    IntoConstraints
    -0.75
     "..\..\..\
    -0.68
    DeleteBehavior
    -0.67
     <>",
    -0.64
    ])));
    -0.63
     purpoſe
    -0.61
    ]){
    
    -0.61
    HideFlags
    -0.61
     "..\..\
    -0.60
    POSITIVE LOGITS
     pretended
    0.76
     pretend
    0.76
     pretending
    0.73
     pretends
    0.71
     feign
    0.62
     pretense
    0.61
     falsely
    0.61
    ふり
    0.59
     fake
    0.59
     aparentemente
    0.56
    Act Density 0.369%

    No Known Activations