INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    fett
    -0.41
    itschrift
    -0.41
    colhead
    -0.41
     zemi
    -0.40
     dė
    -0.40
     noDo
    -0.40
    matite
    -0.40
    Pré
    -0.39
    Musique
    -0.39
     kwe
    -0.39
    POSITIVE LOGITS
    los
    2.67
    LOS
    1.79
    lose
    1.42
     LOS
    1.40
    loss
    1.40
    Los
    1.30
    losa
    1.26
     los
    1.24
    losen
    1.16
     Los
    1.15
    Act Density 0.010%

    No Known Activations