INDEX
    Explanations

    questions punctuated with a question mark

    New Auto-Interp
    Negative Logits
    enen
    -0.16
    ¢
    -0.15
    eness
    -0.14
    ibase
    -0.14
    éłĵ
    -0.14
    quez
    -0.14
    äl
    -0.14
    ãĥ§
    -0.14
    _DF
    -0.13
    ãĤ¥
    -0.13
    POSITIVE LOGITS
    ariat
    0.17
    ÑģоÑĢ
    0.16
    Affected
    0.16
    bia
    0.16
    704
    0.15
    rama
    0.15
    lic
    0.15
    Insensitive
    0.15
    basket
    0.14
    dock
    0.14
    Act Density 0.033%

    No Known Activations