INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    i
    1.31
    י
    1.21
     can
    1.15
    o
    1.06
    :
    1.06
    a
    1.05
     in
    1.02
    0.99
     d
    0.96
    ה
    0.91
    POSITIVE LOGITS
    𝟎
    0.75
    0.69
     vitth
    0.66
    thand
    0.64
    0.64
     spéciales
    0.63
    クル
    0.62
    sparsebundle
    0.62
    0.62
    ต์
    0.61
    Act Density 2.092%

    No Known Activations