INDEX
    Explanations

    quite followed by an adjective

    New Auto-Interp
    Negative Logits
     by
    1.16
    یم
    1.13
    3
    1.08
    های
    1.00
    0
    1.00
    ि
    0.96
    یش
    0.96
    </h3>
    0.95
    ג
    0.95
    0.93
    POSITIVE LOGITS
    in
    1.71
    on
    1.52
    c
    1.43
    p
    1.42
    ak
    1.38
    t
    1.38
    al
    1.24
    1.23
    ut
    1.18
    ار
    1.16
    Act Density 0.006%

    No Known Activations