INDEX
    Explanations

    plotting code and labels

    New Auto-Interp
    Negative Logits
     Align
    1.12
    Im
    0.91
    Align
    0.91
    aligned
    0.86
    Alignment
    0.85
     Grounds
    0.83
    1
    0.81
     Divided
    0.79
    It
    0.79
    Those
    0.79
    POSITIVE LOGITS
    1.36
    第壹百
    1.24
    1.20
    1.19
    1.19
     spowod
    1.16
    1.15
     akibat
    1.15
    1.12
    pOt
    1.12
    Act Density 0.000%

    No Known Activations