INDEX
    Explanations

    personal reflections and expressions of disbelief about societal issues

    New Auto-Interp
    Negative Logits
    altar
    -0.07
     supposed
    -0.07
    wap
    -0.07
     вдÑĢÑĥг
    -0.07
    åĺĽ
    -0.07
    ÐŁÐļ
    -0.06
    ijken
    -0.06
     maybe
    -0.06
     MAY
    -0.06
     ведÑĮ
    -0.06
    POSITIVE LOGITS
     nowhere
    0.07
    ambre
    0.07
     absolutely
    0.07
     ikke
    0.07
    ddy
    0.06
    ategor
    0.06
    -h
    0.06
     Absolutely
    0.06
    ivé
    0.06
     không
    0.06
    Act Density 0.028%

    No Known Activations