INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ק
    2.28
    ח
    1.97
    ك
    1.95
    ch
    1.84
    ادرة
    1.75
    ના
    1.70
     antérieurs
    1.70
     kandungan
    1.69
    ש
    1.69
    ков
    1.66
    POSITIVE LOGITS
     hindering
    1.73
     profusely
    1.73
     hangover
    1.69
    hearted
    1.68
    的な
    1.66
     knowingly
    1.64
    erweise
    1.63
    d
    1.62
     inwardly
    1.60
     rescaling
    1.59
    Act Density 0.029%

    No Known Activations