INDEX
    Explanations

    quantitative/directional relationships

    The neuron fires on the literal token “positive” when labeling text sentiment.

    New Auto-Interp
    Negative Logits
    -0.07
     cancers
    -0.07
    μέ
    -0.07
     від
    -0.07
    	ui
    -0.06
    (bit
    -0.06
    '%
    -0.06
    -0.06
     کامپی
    -0.06
     будто
    -0.06
    POSITIVE LOGITS
     Cosmos
    0.06
     BODY
    0.06
     inefficient
    0.06
    -security
    0.06
     domaine
    0.06
     entreprises
    0.06
    church
    0.06
    Cliente
    0.06
    _Game
    0.06
    ğer
    0.06
    Act Density 0.001%

    No Known Activations