INDEX
    Explanations

    expressions related to opinions or beliefs

    New Auto-Interp
    Negative Logits
     ourselves
    -0.23
    ï¼ĮæĪij们
    -0.22
    æĪij们çļĦ
    -0.21
     Ú©ÙĨÛĮÙħ
    -0.20
     دارÛĮÙħ
    -0.19
     Them
    -0.18
     abbiamo
    -0.18
    immel
    -0.18
    Them
    -0.18
    .We
    -0.18
    POSITIVE LOGITS
     me
    1.05
    me
    0.57
     менÑı
    0.54
    _me
    0.48
    -me
    0.47
     ME
    0.46
     мне
    0.45
     Me
    0.44
    .me
    0.42
    	me
    0.40
    Act Density 0.261%

    No Known Activations