INDEX
Explanations
Problems and malfunctions
the neuron activates on words describing reported problems, faults, or investigations (e.g. incident, discovered, corrosion, difficulty).
New Auto-Interp
Negative Logits
ap
-0.07
jenter
-0.07
----------------------------------------------------------------------------------------------------------------
-0.06
nale
-0.06
sug
-0.06
mezun
-0.06
masturb
-0.06
intern
-0.06
През
-0.06
(log
-0.06
POSITIVE LOGITS
YLES
0.07
(Long
0.07
rons
0.07
irate
0.06
.Future
0.06
symmetric
0.06
;"↵
0.06
Finding
0.06
setFont
0.06
_OVERFLOW
0.06
Activations Density 0.010%