INDEX
    Explanations

    marginalized

    New Auto-Interp
    Negative Logits
     Polynomial
    -0.06
     planning
    -0.06
     pool
    -0.06
     phố
    -0.06
    Live
    -0.06
     exit
    -0.06
     Chem
    -0.06
     borderline
    -0.06
     bio
    -0.06
     Chemistry
    -0.06
    POSITIVE LOGITS
     marginalized
    0.07
    عن
    0.07
    ιβ
    0.07
    ày
    0.07
    0.07
    φέ
    0.07
    erved
    0.07
     правиль
    0.06
     Ner
    0.06
     yandan
    0.06
    Act Density 0.012%

    No Known Activations