INDEX
    Explanations

    references to care and responsibility for oneself and others

    New Auto-Interp
    Negative Logits
    otal
    -0.16
    åIJĪæł¼
    -0.15
    abor
    -0.15
    uras
    -0.15
     Bart
    -0.14
     Raid
    -0.14
    andon
    -0.14
    l
    -0.14
    obb
    -0.13
    partials
    -0.13
    POSITIVE LOGITS
    ooter
    0.17
    yntax
    0.17
    lsen
    0.16
     sick
    0.16
    azor
    0.15
     yandan
    0.14
    GED
    0.14
    èħ
    0.14
     Sick
    0.14
     infrastructure
    0.14
    Act Density 0.072%

    No Known Activations