INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    el
    -0.08
    l
    -0.07
    <li
    -0.07
     lee
    -0.07
    :r
    -0.07
    ل
    -0.07
     falls
    -0.07
     Cole
    -0.07
     Fill
    -0.07
     ii
    -0.06
    POSITIVE LOGITS
     about
    0.20
    about
    0.15
     About
    0.14
     ABOUT
    0.11
    About
    0.11
    -about
    0.10
    bout
    0.09
    brate
    0.08
    ABOUT
    0.08
    .about
    0.08
    Act Density 0.116%

    No Known Activations