INDEX
Explanations
American
This neuron specifically activates on the word “American.”
New Auto-Interp
Negative Logits
Zheng
-0.07
Whilst
-0.07
Whilst
-0.07
zoo
-0.07
Fixture
-0.07
Thu
-0.07
whilst
-0.06
Planning
-0.06
Submission
-0.06
"If
-0.06
POSITIVE LOGITS
America
0.14
American
0.14
American
0.11
America
0.11
amer
0.11
Americans
0.10
Americ
0.09
Amer
0.09
man
0.09
normals
0.08
Activations Density 0.027%