INDEX
    Explanations

    pronouns indicating personal experience and relationships

    New Auto-Interp
    Negative Logits
    од
    -0.54
    out
    -0.54
    The
    -0.52
    One
    -0.52
    st
    -0.51
    on
    -0.51
    ly
    -0.51
    ou
    -0.51
    -0.50
    one
    -0.50
    POSITIVE LOGITS
    RetentionPolicy
    0.95
    AutoScaleMode
    0.94
    Autoritní
    0.94
    TagMode
    0.92
    LEncoder
    0.92
    phazard
    0.91
    abetes
    0.91
    uxxxx
    0.89
    NUMX
    0.89
     nahilalakip
    0.87
    Act Density 0.152%

    No Known Activations