INDEX
    Explanations

    premise or foundational idea

    New Auto-Interp
    Negative Logits
    a
    1.57
    de
    1.31
    د
    1.30
    માં
    1.26
    1.10
    1.09
    ру
    1.08
     a
    1.07
     In
    1.05
    yl
    1.02
    POSITIVE LOGITS
     கொஞ்சம்
    0.92
     agricoles
    0.92
    uds
    0.91
     работников
    0.90
    КС
    0.89
    водить
    0.88
     einen
    0.86
    0.85
    يا
    0.85
    acide
    0.85
    Act Density 0.040%

    No Known Activations