INDEX
    Explanations

    references to the act of reading

    New Auto-Interp
    Negative Logits
    ask
    -0.17
    iv
    -0.16
    ck
    -0.16
    d
    -0.16
     oc
    -0.15
    udad
    -0.15
    sten
    -0.15
    ated
    -0.15
    use
    -0.15
    ped
    -0.15
    POSITIVE LOGITS
    just
    0.24
    /list
    0.23
    /view
    0.23
    mitted
    0.22
     comprehension
    0.21
    /watch
    0.20
    åıĸ
    0.20
    /write
    0.19
    ied
    0.19
    iness
    0.18
    Act Density 0.072%

    No Known Activations