INDEX
    Explanations

    repeated word forms and grammatical structures in various languages

    New Auto-Interp
    Negative Logits
     myſelf
    -1.09
     iſt
    -0.99
     raiſ
    -0.93
    NameInMap
    -0.93
     ་་
    -0.93
     Anſ
    -0.91
     ſche
    -0.90
     Diſ
    -0.90
     faſt
    -0.89
     ſind
    -0.88
    POSITIVE LOGITS
    -
    0.66
    ,
    0.63
    <eos>
    0.62
     of
    0.62
    ?
    0.62
     -
    0.61
     (
    0.60
    [toxicity=0]
    0.56
    </sub>
    0.56
    ubereitung
    0.56
    Act Density 0.008%

    No Known Activations