INDEX
    Explanations

    instances of conditional phrasing and auxiliary verbs

    New Auto-Interp
    Negative Logits
    tha
    -0.17
    zin
    -0.16
     Flynn
    -0.14
    uced
    -0.14
    ÑĢд
    -0.14
    aping
    -0.14
    izoph
    -0.13
    atar
    -0.13
     inn
    -0.13
    aved
    -0.13
    POSITIVE LOGITS
    egen
    0.17
    773
    0.17
    spot
    0.16
    olta
    0.15
     Finger
    0.15
    ritos
    0.15
     spots
    0.14
    chu
    0.14
    иÑģк
    0.14
    975
    0.14
    Act Density 0.179%

    No Known Activations