INDEX
    Explanations

    pronouns, particularly possessive pronouns associated with individuals

    New Auto-Interp
    Negative Logits
     CIT
    -0.55
    mun
    -0.50
    Nazionale
    -0.49
    SetTitle
    -0.48
     ditto
    -0.48
    fraid
    -0.47
     والن
    -0.47
    slice
    -0.45
     balles
    -0.45
    erunner
    -0.45
    POSITIVE LOGITS
     }}"></
    0.74
     INSEE
    0.73
    ')}}
    0.67
    }());
    0.65
    '));
    
    0.64
    meanor
    0.63
     vuitton
    0.62
    )});
    0.62
    @"
    0.62
    >());
    0.62
    Act Density 0.151%

    No Known Activations