INDEX
    Explanations

    sentences that state facts or assertions

    New Auto-Interp
    Negative Logits
    ĭ
    -4.70
    ģ
    -4.69
    Ļª
    -4.68
    »¿
    -4.47
    Īĺ
    -4.44
    į
    -4.36
    Ĥ¬
    -4.33
    İ
    -4.26
    Ģ
    -4.23
    ī
    -4.23
    POSITIVE LOGITS
     yours
    2.03
     hers
    2.00
     imperative
    1.82
     ours
    1.79
     our
    1.69
    APTER
    1.62
     my
    1.58
     an
    1.55
     Cookie
    1.41
     doubly
    1.39
    Act Density 0.302%

    No Known Activations