INDEX
    Explanations

    conjunctions that introduce contrast or exceptions

    New Auto-Interp
    Negative Logits
    zeÅĦ
    -0.15
    istrovstvÃŃ
    -0.14
    olie
    -0.14
    عب
    -0.13
    ocre
    -0.13
    mour
    -0.13
    crest
    -0.13
    .hw
    -0.13
    ekk
    -0.13
     пÑĢавило
    -0.13
    POSITIVE LOGITS
    /or
    0.15
    âĤ¬“
    0.14
    ÑĢа
    0.14
    verts
    0.14
    atr
    0.14
    lem
    0.13
    chn
    0.13
    icles
    0.13
    radient
    0.13
    /OR
    0.13
    Act Density 0.287%

    No Known Activations