INDEX
    Explanations

    instances of the word "take" and its variations

    New Auto-Interp
    Negative Logits
    ampa
    -0.07
    upe
    -0.06
    rike
    -0.06
    strup
    -0.06
    apur
    -0.06
    éłħ
    -0.06
     rem
    -0.06
    anton
    -0.06
    ially
    -0.05
    .coord
    -0.05
    POSITIVE LOGITS
     shape
    0.17
    shape
    0.14
     root
    0.14
     Shape
    0.14
    Shape
    0.12
     hold
    0.12
     shapes
    0.12
    hape
    0.11
    root
    0.11
    _shape
    0.10
    Act Density 0.016%

    No Known Activations