INDEX
    Explanations

    phrases that indicate frequency or repetition of events

    New Auto-Interp
    Negative Logits
    ̣
    -0.16
    .scalablytyped
    -0.16
    paged
    -0.15
    رÙĬÙĥÙĬØ©
    -0.14
    shed
    -0.14
     áº
    -0.14
    iou
    -0.14
    ease
    -0.14
    次
    -0.14
    ìĿ´ìĹIJ
    -0.14
    POSITIVE LOGITS
     blue
    0.29
     awhile
    0.27
     Blue
    0.25
    blue
    0.25
    -blue
    0.23
     BLUE
    0.23
    Blue
    0.23
    BLUE
    0.22
    aw
    0.21
    .blue
    0.21
    Act Density 0.014%

    No Known Activations