INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    z
    0.52
    D
    0.46
    G
    0.45
     sleep
    0.43
    गिर
    0.43
    sleep
    0.41
     abone
    0.41
    א
    0.41
    0.41
    bed
    0.41
    POSITIVE LOGITS
    0.46
     stemmed
    0.46
    ựa
    0.45
    etermin
    0.42
    erà
    0.42
    ියා
    0.42
     centred
    0.42
    raum
    0.41
     tụ
    0.41
    ۴
    0.41
    Act Density 0.001%

    No Known Activations