INDEX
Explanations
The neuron fires on mentions of evaluation datasets (e.g. KAIST, UCSD, “dataset”) in the text.
New Auto-Interp
Negative Logits
ships
-0.07
ship
-0.07
care
-0.06
Ves
-0.06
hardly
-0.06
vys
-0.06
LOVE
-0.06
_cap
-0.06
-services
-0.06
_window
-0.06
POSITIVE LOGITS
larının
0.07
steroids
0.06
clinic
0.06
<center
0.06
قالب
0.06
borderTop
0.06
através
0.06
metab
0.06
_rr
0.06
fetisch
0.06
Activations Density 0.027%