INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ROY
    -0.09
     garage
    -0.08
     Guill
    -0.08
    embrance
    -0.08
     vaut
    -0.07
     cag
    -0.07
     deix
    -0.07
    حد
    -0.07
     cough
    -0.07
    Covered
    -0.07
    POSITIVE LOGITS
     profundo
    0.08
     textbooks
    0.08
     literacy
    0.08
     Literacy
    0.08
    endregion
    0.08
    _rd
    0.08
     Cristian
    0.07
     Tina
    0.07
     textbook
    0.07
    \d
    0.07
    Act Density 0.001%

    No Known Activations