INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.80
     الحره
    -0.61
     initComponents
    -0.57
    ."]
    -0.57
    .*")]
    -0.56
    .’”
    -0.56
    ).]
    -0.55
    .'"
    -0.54
    ])));
    -0.54
    >*/
    -0.51
    POSITIVE LOGITS
     a
    0.58
    abella
    0.57
     Brahma
    0.55
     each
    0.55
     Shiva
    0.55
     base
    0.53
    mặt
    0.52
    mvn
    0.52
     Bohr
    0.52
    lossians
    0.52
    Act Density 0.003%

    No Known Activations