INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.76
    0.73
    Mâc
    0.70
    0.69
     afterDir
    0.67
    ຽງ
    0.65
     ராஜ
    0.65
    0.65
    0.64
    ្វី
    0.62
    POSITIVE LOGITS
    H
    2.03
     H
    2.02
    1.95
    1.92
     h
    1.84
    1.82
    1.80
     HT
    1.79
     HC
    1.76
    1.75
    Act Density 0.927%

    No Known Activations