INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    oh
    1.84
    oleh
    1.69
    ids
    1.54
    ö
    1.53
    1.48
    ime
    1.48
    ons
    1.43
    ach
    1.41
    දි
    1.39
    arin
    1.38
    POSITIVE LOGITS
    م
    2.33
    ла
    1.88
    י
    1.86
    و
    1.81
    1.51
    ي
    1.48
     UIS
    1.46
    ك
    1.44
    ча
    1.37
     RAF
    1.36
    Act Density 0.001%

    No Known Activations