INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ść
    2.73
     fleste
    2.67
    োহণ
    2.45
    ués
    2.43
    েন্ড
    2.42
    lında
    2.37
     проб
    2.37
     confounding
    2.36
     použ
    2.34
     közel
    2.33
    POSITIVE LOGITS
    3.29
    т
    3.20
    й
    2.82
    தி
    2.76
    د
    2.71
    zeitig
    2.71
    2.70
    י
    2.62
    рони
    2.61
    tabla
    2.54
    Act Density 0.014%

    No Known Activations