INDEX
    Explanations

    expressions of human suffering

    New Auto-Interp
    Negative Logits
    ason
    -0.16
     Dynamo
    -0.15
     Dun
    -0.14
    åį«
    -0.14
     crossing
    -0.14
     Burl
    -0.14
     erhalten
    -0.13
     Opp
    -0.13
     Kauf
    -0.13
     alike
    -0.13
    POSITIVE LOGITS
    fv
    0.15
    avn
    0.15
     ков
    0.15
    747
    0.15
    Leaf
    0.14
    ³
    0.14
     Mahar
    0.14
    ouz
    0.14
    LineNumber
    0.14
    ior
    0.14
    Act Density 0.002%

    No Known Activations