INDEX
    Explanations

    original intent or meaning

    New Auto-Interp
    Negative Logits
    -
    0.63
    行う
    0.62
    '
    0.62
     ventricle
    0.61
    お金
    0.61
    0.60
     dakika
    0.59
     basura
    0.59
     diffraction
    0.58
     wedges
    0.57
    POSITIVE LOGITS
    original
    1.34
     originale
    1.27
     Original
    1.24
     original
    1.23
    Original
    1.14
     ORIGINAL
    0.99
     orijinal
    0.98
     ursprüng
    0.98
     оригі
    0.93
     оригина
    0.93
    Act Density 0.114%

    No Known Activations