INDEX
Explanations
The neuron fires on longer, domain‐specific technical or specialized jargon terms.
New Auto-Interp
Negative Logits
frightened
-0.07
Gay
-0.06
ApiService
-0.06
importantes
-0.06
.mutable
-0.06
others
-0.06
.note
-0.06
心
-0.06
araya
-0.06
Reasons
-0.06
POSITIVE LOGITS
Work
0.09
footage
0.08
zboží
0.08
Conditional
0.08
legislation
0.08
research
0.08
imagery
0.07
signage
0.07
work
0.07
Regulation
0.07
Activations Density 0.864%