INDEX
    Explanations

    describing code scripts

    New Auto-Interp
    Negative Logits
    也不是
    1.07
     rather
    1.06
     another
    1.01
     instead
    1.01
     other
    0.98
     not
    0.96
    而不是
    0.91
    rather
    0.88
    并不是
    0.86
    ไม่ใช่
    0.86
    POSITIVE LOGITS
    </th>
    0.84
     (+)
    0.80
    0.79
    =""><
    0.78
     -->
    0.77
    0.73
    Connector
    0.73
    ‌ترین
    0.73
    ጣጠ
    0.73
    ---------------+
    0.71
    Act Density 2.235%

    No Known Activations