INDEX
    Explanations

    German words and phrases

    New Auto-Interp
    Negative Logits
    pton
    -0.59
     Canad
    -0.54
     Cameron
    -0.54
    plex
    -0.51
    atown
    -0.49
     disenfranch
    -0.49
    vier
    -0.49
    oleon
    -0.48
     Horowitz
    -0.48
    abwe
    -0.48
    POSITIVE LOGITS
    ŀ
    0.70
    ness
    0.63
    ú
    0.62
    ener
    0.61
    ë
    0.60
    itled
    0.59
    emption
    0.57
    ure
    0.57
    û
    0.57
    raction
    0.56
    Act Density 0.493%

    No Known Activations