INDEX
    Explanations

    The neuron activates on the word “types,” signaling list or categorization cues in the text.

    New Auto-Interp
    Negative Logits
     of
    -0.06
     сю
    -0.06
    =b
    -0.06
     نامه
    -0.06
    _energy
    -0.06
    	app
    -0.06
    งส
    -0.06
    -the
    -0.06
     їй
    -0.06
     tha
    -0.06
    POSITIVE LOGITS
     type
    0.11
     kinds
    0.11
     kind
    0.11
     types
    0.10
     sorts
    0.10
     tipo
    0.08
     Types
    0.08
    -types
    0.08
    Tipo
    0.08
     sort
    0.07
    Act Density 0.036%

    No Known Activations