INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     precise
    -1.65
    precise
    -1.48
     Precise
    -1.36
    Precise
    -1.36
     exact
    -1.19
    exact
    -1.13
     précise
    -1.07
     genaue
    -1.05
     precisely
    -1.02
     Exact
    -0.97
    POSITIVE LOGITS
    ly
    1.04
    er
    0.82
    ley
    0.63
    ca
    0.60
    ce
    0.58
    LY
    0.58
    ment
    0.57
    ks
    0.56
    lt
    0.56
    ities
    0.54
    Act Density 0.042%

    No Known Activations