INDEX
Explanations
role/roles
This neuron detects occurrences of the word “role.”
New Auto-Interp
Negative Logits
weet
-0.08
cleaning
-0.08
KING
-0.07
Davis
-0.07
inspection
-0.07
Mint
-0.06
gorithm
-0.06
Math
-0.06
Smith
-0.06
Bay
-0.06
POSITIVE LOGITS
Role
0.14
role
0.14
Role
0.13
role
0.12
-role
0.12
roles
0.12
ROLE
0.10
Roles
0.10
roleName
0.10
.role
0.09
Activations Density 0.038%