INDEX
    Explanations

    phrases indicating conviction or certainty

    New Auto-Interp
    Negative Logits
    did
    -0.21
     did
    -0.19
     DID
    -0.19
    Did
    -0.18
     Did
    -0.18
    shall
    -0.17
    does
    -0.16
    ahl
    -0.16
    didn
    -0.16
    λλι
    -0.15
    POSITIVE LOGITS
     is
    0.33
     are
    0.32
     ARE
    0.28
     was
    0.28
     æĺ¯
    0.26
     adalah
    0.26
     ÑıвлÑıеÑĤÑģÑı
    0.25
     ÑıвлÑıÑİÑĤÑģÑı
    0.24
    _are
    0.24
    æĺ¯
    0.23
    Act Density 0.170%

    No Known Activations