INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    -1.88
     sebagai
    -1.20
    as
    -0.89
     как
    -0.88
     као
    -0.85
    作为
    -0.83
     jako
    -0.81
     الاطلاع
    -0.77
    作為
    -0.77
     ως
    -0.75
    POSITIVE LOGITS
     a
    0.91
     well
    0.80
     the
    0.71
     an
    0.71
    pires
    0.68
    cribes
    0.66
     follows
    0.66
    cription
    0.61
     part
    0.60
     opposed
    0.60
    Act Density 0.364%

    No Known Activations