INDEX
Explanations
textbook
The neuron activates specifically on mentions of “textbook” (and its plural “textbooks”).
New Auto-Interp
Negative Logits
shields
-0.08
forcement
-0.07
-match
-0.07
Gh
-0.07
predictions
-0.07
Fitz
-0.06
Advice
-0.06
Juice
-0.06
Detect
-0.06
.diag
-0.06
POSITIVE LOGITS
textbook
0.08
ением
0.07
CGFloat
0.06
stone
0.06
tej
0.06
Tobacco
0.06
0.06
Ear
0.06
очный
0.06
дитини
0.06
Activations Density 0.003%