INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.54
    #
    -0.40
    indest
    -0.39
    ąp
    -0.37
     terci
    -0.35
    do
    -0.35
    chuckles
    -0.34
    return
    -0.34
    Ell
    -0.34
    ]')
    -0.33
    POSITIVE LOGITS
     Languages
    1.78
     languages
    1.69
    Languages
    1.55
    languages
    1.50
     Sprachen
    1.19
    anguages
    1.03
     lingue
    1.02
     langues
    0.92
     idiomas
    0.92
     lenguas
    0.89
    Act Density 0.005%

    No Known Activations