INDEX
Explanations
The neuron activates on occurrences of the verb “pose” (and its variants like “poses”) indicating the presentation of a risk or threat.
New Auto-Interp
Negative Logits
reclaim
-0.08
-third
-0.07
Handbook
-0.07
lett
-0.07
increments
-0.07
Atlantic
-0.07
Atlantic
-0.07
seventh
-0.07
-heart
-0.06
club
-0.06
POSITIVE LOGITS
posing
0.12
posed
0.11
poses
0.09
pose
0.09
Pos
0.08
Pos
0.07
Pose
0.07
imestep
0.07
Pose
0.07
occured
0.07
Activations Density 0.015%