INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    fleisch
    -0.64
    leſs
    -0.57
    brü
    -0.54
    ing
    -0.54
     dumne
    -0.53
    atki
    -0.53
    -0.53
    Blow
    -0.52
     вод
    -0.52
    ütü
    -0.51
    POSITIVE LOGITS
    ,",
    1.50
    )",
    1.49
    )',
    1.41
    >",
    1.41
    }`,
    1.40
    ?",
    1.39
    ]',
    1.39
    ]",
    1.38
    \"",
    1.36
    %",
    1.34
    Act Density 0.060%

    No Known Activations