INDEX
Explanations
The neuron strongly activates on verbs that offer assistance or support (e.g. “help,” “assist”).
New Auto-Interp
Negative Logits
instit
-0.06
zákaz
-0.06
ric
-0.06
quee
-0.06
_records
-0.06
ignon
-0.06
üne
-0.06
정
-0.06
ora
-0.06
nim
-0.06
POSITIVE LOGITS
help
0.15
helps
0.13
helped
0.13
helping
0.13
Help
0.12
help
0.12
Helping
0.11
helpful
0.11
Helps
0.11
Help
0.10
Activations Density 0.133%