INDEX
    Explanations

    The neuron detects evaluative quality descriptors—especially the word “mediocre” (and other rating adjectives like “extremely bad”).

    New Auto-Interp
    Negative Logits
     hate
    -0.06
    elease
    -0.06
    HEET
    -0.06
    336
    -0.06
     lance
    -0.06
    -0.06
    =-=-=-=-
    -0.06
     gst
    -0.05
    ))+
    -0.05
    られた
    -0.05
    POSITIVE LOGITS
     mediocre
    0.08
     Instruction
    0.08
     subjects
    0.07
     instruction
    0.07
     국가
    0.07
     midi
    0.07
     gaz
    0.07
    VN
    0.07
    Capability
    0.07
     prostoru
    0.07
    Act Density 0.001%

    No Known Activations