INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    626
    -0.08
     dollars
    -0.08
    okuq
    -0.08
     GPA
    -0.07
     Überblick
    -0.07
     odn
    -0.07
     décider
    -0.07
     μ
    -0.07
    ukan
    -0.07
     Goal
    -0.07
    POSITIVE LOGITS
    0.09
     vermijden
    0.09
     adhere
    0.08
     лиш
    0.08
     соблюдать
    0.08
    avoid
    0.08
    alternate
    0.08
    reme
    0.08
     avoid
    0.08
    避免
    0.08
    Act Density 0.004%

    No Known Activations