INDEX
    Explanations

    different languages and contexts

    New Auto-Interp
    Negative Logits
    VICES
    0.40
    ρον
    0.40
    𝔯
    0.39
     inverted
    0.39
     contractions
    0.39
     espécie
    0.39
    すなわち
    0.39
     carrinho
    0.38
     same
    0.38
    мі
    0.38
    POSITIVE LOGITS
     Unlike
    0.46
    --
    0.43
    াবেন
    0.42
     സ്വാ
    0.41
    vt
    0.41
     تطبيقات
    0.40
     Herstellung
    0.39
    Heter
    0.39
     Heter
    0.38
     Tayyip
    0.38
    Act Density 0.001%

    No Known Activations