INDEX
Explanations
This neuron fires on occurrences of the word “group” (and its close variants) in a group‐theory context.
New Auto-Interp
Negative Logits
拍
-0.07
757
-0.07
maker
-0.06
очень
-0.06
Doc
-0.06
bet
-0.06
_Delay
-0.06
Microsoft
-0.06
provider
-0.06
대답
-0.06
POSITIVE LOGITS
grupos
0.08
오늘
0.07
Storm
0.07
rika
0.07
ausge
0.07
CL
0.06
ilitation
0.06
jego
0.06
gluten
0.06
AxisSize
0.06
Activations Density 0.018%