INDEX
    Explanations

    explaining, outlining, describing

    New Auto-Interp
    Negative Logits
     într
    0.42
    已经在
    0.39
    被人
    0.38
     যখন
    0.38
     সম্মুখীন
    0.38
    ผ่าน
    0.37
     работают
    0.37
     бывают
    0.37
    0.36
     agus
    0.36
    POSITIVE LOGITS
     describing
    1.11
     explaining
    1.07
     indicating
    1.01
     outlining
    1.01
     indiquant
    0.97
     detailing
    0.90
     specifying
    0.89
     explicando
    0.86
    indicating
    0.83
     stating
    0.82
    Act Density 0.155%

    No Known Activations