INDEX
    Explanations

    references to attention and detail

    New Auto-Interp
    Negative Logits
    ped
    -0.19
    oce
    -0.16
    odont
    -0.16
    oca
    -0.16
    ping
    -0.15
    aven
    -0.15
    hest
    -0.15
    brook
    -0.15
    uden
    -0.15
    abin
    -0.14
    POSITIVE LOGITS
     paid
    0.27
     span
    0.23
    al
    0.23
     Paid
    0.22
     spans
    0.21
    Paid
    0.20
    åĬĽ
    0.19
    paid
    0.19
     grabbing
    0.19
    -paid
    0.18
    Act Density 0.017%

    No Known Activations