INDEX
Explanations
Introductions/Overviews
The neuron fires on hedging or qualifying language—words and phrases that express uncertainty, difficulty, or broad generalizations (e.g. “heavily,” “variety of factors,” “difficult to identify”).
New Auto-Interp
Negative Logits
nr
-0.07
reactionary
-0.07
tuple
-0.06
ωσε
-0.06
преж
-0.06
KB
-0.06
Tate
-0.06
_steps
-0.06
beer
-0.06
Ward
-0.06
POSITIVE LOGITS
оком
0.07
_boolean
0.06
unthinkable
0.06
precisely
0.06
ляют
0.06
Disaster
0.06
physicists
0.06
.clientY
0.06
материалов
0.06
своє
0.06
Activations Density 0.199%