INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -af
    -0.08
     akong
    -0.08
     agon
    -0.08
    -enwe
    -0.08
     scient
    -0.08
     umntu
    -0.08
     versa
    -0.08
     Gou
    -0.08
     нафар
    -0.08
     ahaan
    -0.08
    POSITIVE LOGITS
     tve
    0.09
     threshold
    0.08
     ]]↵
    0.08
    aped
    0.07
     concept
    0.07
     spolu
    0.07
    ziert
    0.07
                                                    
    0.07
     Euler
    0.07
     rounded
    0.07
    Act Density 0.003%

    No Known Activations