INDEX
    Explanations

    references to individuals, especially pronouns and possessive forms

    New Auto-Interp
    Negative Logits
    ãĤ¤ãĥ«
    -0.17
    ascus
    -0.15
    awan
    -0.14
    ient
    -0.14
    .utilities
    -0.14
    -navbar
    -0.14
    wan
    -0.14
    ICO
    -0.14
    usk
    -0.13
    stal
    -0.13
    POSITIVE LOGITS
    uby
    0.15
    tdown
    0.15
    igy
    0.14
     æĻ´
    0.14
    ÑĥÑĢи
    0.13
    ucid
    0.13
    را
    0.13
    ICODE
    0.13
    erton
    0.13
    elor
    0.13
    Act Density 0.043%

    No Known Activations