INDEX
    Explanations

    combining partial words into full words

    New Auto-Interp
    Negative Logits
    ش
    0.56
    ש
    0.54
    ח
    0.48
    نا
    0.46
    0.44
    smöglichkeiten
    0.42
    م
    0.41
     caram
    0.40
    0.40
     reconoc
    0.38
    POSITIVE LOGITS
     
    0.49
    I
    0.46
    P
    0.41
     l
    0.40
    ac
    0.39
    R
    0.39
    <0xE2>
    0.36
    ak
    0.36
    ird
    0.35
    Y
    0.35
    Act Density 0.105%

    No Known Activations