INDEX
Explanations
Automatic
The neuron detects occurrences of the word “automatic” (in any capitalization).
New Auto-Interp
Negative Logits
Teil
-0.07
rele
-0.07
-Life
-0.07
Ho
-0.07
exploited
-0.07
Gre
-0.07
Spo
-0.06
revenue
-0.06
Shade
-0.06
den
-0.06
POSITIVE LOGITS
Automatic
0.09
automatic
0.09
automatic
0.08
oman
0.07
(!$
0.07
自動
0.07
attempts
0.07
declaration
0.07
automat
0.07
electrónico
0.07
Activations Density 0.011%