INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cheek
    -0.08
    -0.08
    -0.07
     Dmit
    -0.07
    -0.07
     acquainted
    -0.07
     fortnight
    -0.07
     spouse
    -0.07
     Tuesday
    -0.07
     Donnerstag
    -0.07
    POSITIVE LOGITS
    -specific
    0.11
    -dependent
    0.10
    0.09
    -induced
    0.09
    -wise
    0.09
     влияет
    0.09
     matters
    0.08
     affects
    0.08
     mismatch
    0.08
    ewise
    0.08
    Act Density 0.063%

    No Known Activations