INDEX
    Explanations

    The neuron detects occurrences of the adjective “optimal.”

    New Auto-Interp
    Negative Logits
     person
    -0.07
     guy
    -0.07
     created
    -0.07
     hide
    -0.07
     made
    -0.07
     friend
    -0.06
    -expand
    -0.06
     girls
    -0.06
    -desc
    -0.06
     animals
    -0.06
    POSITIVE LOGITS
     optimal
    0.09
    0.08
     optimum
    0.08
     Robbins
    0.07
    lepší
    0.07
    oint
    0.07
     Salmon
    0.07
     kval
    0.07
     unanimously
    0.07
     monetary
    0.07
    Act Density 0.006%

    No Known Activations