INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bilde
    -0.08
     Nr
    -0.07
    -0.07
     மூ
    -0.07
     """
    ↵
    -0.06
     PAG
    -0.06
     relations
    -0.06
     Ig
    -0.06
    emann
    -0.06
     Said
    -0.06
    POSITIVE LOGITS
     accomp
    0.08
    eiro
    0.08
    /menu
    0.08
     מוכ
    0.08
     Bleu
    0.08
    ',{
    0.08
    upunct
    0.07
     disciples
    0.07
    grade
    0.07
     glyc
    0.07
    Act Density 0.001%

    No Known Activations