INDEX
    Explanations

    The neuron fires on mentions of evaluation datasets (e.g. KAIST, UCSD, “dataset”) in the text.

    New Auto-Interp
    Negative Logits
     ships
    -0.07
     ship
    -0.07
    care
    -0.06
     Ves
    -0.06
     hardly
    -0.06
     vys
    -0.06
     LOVE
    -0.06
    _cap
    -0.06
    -services
    -0.06
    _window
    -0.06
    POSITIVE LOGITS
    larının
    0.07
     steroids
    0.06
     clinic
    0.06
    <center
    0.06
     قالب
    0.06
     borderTop
    0.06
     através
    0.06
     metab
    0.06
    _rr
    0.06
     fetisch
    0.06
    Act Density 0.027%

    No Known Activations