INDEX
    Explanations

    The neuron is activated by occurrences of “review” (and its plural “reviews”), i.e. it detects review-related terms.

    New Auto-Interp
    Negative Logits
     object
    -0.07
     Dict
    -0.07
     gaping
    -0.07
    AT
    -0.07
    Saint
    -0.07
     horns
    -0.06
    manent
    -0.06
     groupName
    -0.06
     Dit
    -0.06
     Parti
    -0.06
    POSITIVE LOGITS
     reviewer
    0.07
     review
    0.07
    وليو
    0.07
     reviews
    0.07
    んだ
    0.07
     Review
    0.06
    reviews
    0.06
    yr
    0.06
     Reviews
    0.06
    овал
    0.06
    Act Density 0.011%

    No Known Activations