INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     ENT
    -0.07
    -0.07
    nel
    -0.07
     Painter
    -0.07
     white
    -0.07
    igon
    -0.06
    ruits
    -0.06
    ellig
    -0.06
    enville
    -0.06
     drained
    -0.06
    POSITIVE LOGITS
     alumnos
    0.07
    تباد
    0.07
    beros
    0.07
    促进了
    0.07
     её
    0.07
    限制
    0.06
     =================================================
    0.06
     unseren
    0.06
    וחר
    0.06
    ellidos
    0.06
    Act Density 0.001%

    No Known Activations