INDEX
    Explanations

    comparison or lack of certain elements

    New Auto-Interp
    Negative Logits
    8
    0.81
    7
    0.75
    elle
    0.74
    1
    0.71
    th
    0.70
    9
    0.68
    2
    0.68
    0
    0.67
    .
    0.67
    కే
    0.66
    POSITIVE LOGITS
    1.09
     Bhagavato
    1.07
    1.04
    1.03
    ურთიერთ
    1.01
     不同
    1.01
     problémy
    1.00
     bekannten
    0.98
    Dijstra
    0.97
    0.97
    Act Density 0.001%

    No Known Activations