INDEX
Explanations
The neuron is specifically detecting occurrences of the word “premise.”
New Auto-Interp
Negative Logits
сообщ
-0.07
Джон
-0.07
voll
-0.07
lerdir
-0.07
ัฒนา
-0.06
stagger
-0.06
almış
-0.06
والن
-0.06
.onPause
-0.06
harmon
-0.06
POSITIVE LOGITS
premise
0.08
_stmt
0.06
amet
0.06
base
0.06
concept
0.06
_ENDPOINT
0.06
mey
0.06
선수
0.06
feeding
0.06
base
0.06
Activations Density 0.004%