INDEX
    Explanations

    measuring specific factors

    New Auto-Interp
    Negative Logits
    да
    0.63
    Б
    0.58
    на
    0.55
    М
    0.53
    ד
    0.52
    ETER
    0.50
    О
    0.50
    ста
    0.49
    Ти
    0.48
    Ли
    0.48
    POSITIVE LOGITS
     før
    0.44
     initial
    0.42
     قبل
    0.42
     sebelum
    0.42
     ingin
    0.41
     potentiel
    0.40
     أثناء
    0.40
     කිරීමට
    0.40
     πριν
    0.40
     trước
    0.39
    Act Density 0.008%

    No Known Activations