INDEX
    Explanations

    multiple choice options

    New Auto-Interp
    Negative Logits
     kinks
    0.38
     overtook
    0.36
     balk
    0.36
    selves
    0.36
     breakthroughs
    0.35
     गोरख
    0.35
     könnt
    0.35
     probleme
    0.35
     dominated
    0.34
     separates
    0.34
    POSITIVE LOGITS
    v
    0.38
    ir
    0.36
    ش
    0.32
    j
    0.32
    у
    0.32
    et
    0.31
    д
    0.30
    used
    0.29
    se
    0.29
    с
    0.29
    Act Density 0.961%

    No Known Activations