INDEX
    Explanations

    The neuron detects expressions of apology or uncertainty (e.g. “I’m sorry,” “I am not familiar”).

    New Auto-Interp
    Negative Logits
     demokrat
    -0.06
     mascul
    -0.06
     undergo
    -0.06
    -0.06
     humility
    -0.06
     cortical
    -0.06
    ивать
    -0.06
     visceral
    -0.06
     název
    -0.06
    _inventory
    -0.06
    POSITIVE LOGITS
    	window
    0.07
    	element
    0.06
     جر
    0.06
    005
    0.06
    hora
    0.06
     torment
    0.06
     Donate
    0.06
    |:
    0.06
     Flexible
    0.06
    <t
    0.06
    Act Density 0.058%

    No Known Activations