INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    م
    0.47
     όχι
    0.45
    ôn
    0.45
    0.44
    portions
    0.44
    str
    0.44
     وم
    0.42
    מ
    0.42
    0.42
    0.42
    POSITIVE LOGITS
     brauch
    0.49
    ńca
    0.48
     rencana
    0.47
    َرْ
    0.46
     cillum
    0.46
    ла
    0.45
     Bavaria
    0.45
     typing
    0.44
    0.44
    0.44
    Act Density 0.001%

    No Known Activations