INDEX
Explanations
compromise
The neuron primarily detects occurrences of the word “compromise.”
New Auto-Interp
Negative Logits
Sun
-0.07
Animal
-0.07
([])↵
-0.07
JLabel
-0.07
541
-0.07
faculty
-0.07
Flint
-0.07
۱۱
-0.06
13
-0.06
north
-0.06
POSITIVE LOGITS
compromise
0.12
compromises
0.09
compromising
0.09
undermin
0.08
compromised
0.08
comprom
0.08
ば
0.07
okens
0.07
isme
0.07
honest
0.07
Activations Density 0.006%