INDEX
    Explanations

    This neuron detects expressions of desire or intent, i.e. words like “want,” “like,” or “would like.”

    New Auto-Interp
    Negative Logits
     BUTTON
    -0.07
    	flags
    -0.07
     Think
    -0.07
     girls
    -0.07
     Girls
    -0.07
     shifted
    -0.06
     tourists
    -0.06
    _t
    -0.06
     swallowed
    -0.06
     remain
    -0.06
    POSITIVE LOGITS
     stockholm
    0.07
    _ALARM
    0.07
     şar
    0.06
     velkou
    0.06
     frauen
    0.06
     fri
    0.06
     atd
    0.06
     mexico
    0.06
    _Do
    0.06
     друго
    0.06
    Act Density 0.034%

    No Known Activations