INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     medicine
    -0.36
     either
    -0.35
    -0.34
    führer
    -0.34
    myModal
    -0.33
    zulegen
    -0.33
     vägen
    -0.33
     prohibido
    -0.33
     Geheimnis
    -0.33
     wizy
    -0.32
    POSITIVE LOGITS
    {}/
    0.88
    ../
    0.81
    0.79
    .../
    0.74
    !/
    0.72
    -/
    0.71
    '/
    0.71
    ("/
    0.69
     $/
    0.68
    ./
    0.68
    Act Density 0.523%

    No Known Activations