INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ing
    -0.76
    n
    -0.64
    atio
    -0.61
    ingly
    -0.59
    d
    -0.59
    u
    -0.57
    y
    -0.57
    ting
    -0.56
     Modern
    -0.55
     modern
    -0.55
    POSITIVE LOGITS
     feroit
    0.89
     zelve
    0.85
     avancée
    0.83
     télécharge
    0.81
     engraçadas
    0.79
     motivadoras
    0.76
     vermelhas
    0.75
     piú
    0.73
     animés
    0.73
     supérieurs
    0.73
    Act Density 1.597%

    No Known Activations