INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ע
    1.81
    ح
    1.52
    িয়া
    1.30
    ના
    1.30
    1.27
    1.23
    1.20
    1.19
     arquitet
    1.18
    j
    1.18
    POSITIVE LOGITS
    forderungen
    1.25
    1.22
     manhood
    1.15
    ျေး
    1.14
    simpl
    1.12
    ायचे
    1.12
    baren
    1.11
    тара
    1.08
    frage
    1.07
    beard
    1.07
    Act Density 0.019%

    No Known Activations