INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     unconditional
    -0.09
     succeeded
    -0.08
    Regardless
    -0.08
     interrupted
    -0.08
     sequel
    -0.07
     hypert
    -0.07
     unir
    -0.07
    Whatever
    -0.07
     nummer
    -0.07
    imis
    -0.07
    POSITIVE LOGITS
    CHOOL
    0.08
    .scale
    0.08
    TED
    0.08
     осві
    0.08
    führer
    0.08
    tum
    0.08
    Schools
    0.07
    scale
    0.07
     adjective
    0.07
    chool
    0.07
    Act Density 0.001%

    No Known Activations