INDEX
Explanations
The neuron is activated by occurrences of “review” (and its plural “reviews”), i.e. it detects review-related terms.
New Auto-Interp
Negative Logits
object
-0.07
Dict
-0.07
gaping
-0.07
AT
-0.07
Saint
-0.07
horns
-0.06
manent
-0.06
groupName
-0.06
Dit
-0.06
Parti
-0.06
POSITIVE LOGITS
reviewer
0.07
review
0.07
وليو
0.07
reviews
0.07
んだ
0.07
Review
0.06
reviews
0.06
yr
0.06
Reviews
0.06
овал
0.06
Activations Density 0.011%