INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    dır
    0.51
    да
    0.46
    ные
    0.45
    드의
    0.44
    cknowled
    0.42
    ώνει
    0.42
    データの
    0.42
    دى
    0.41
    引き
    0.41
    ники
    0.41
    POSITIVE LOGITS
    4
    0.64
    6
    0.54
    8
    0.54
    7
    0.54
     dostup
    0.49
    all
    0.48
     Оста
    0.47
     faiths
    0.47
     interna
    0.46
    5
    0.46
    Act Density 0.023%

    No Known Activations