INDEX
    Explanations

    This neuron specifically activates on the word “American.”

    New Auto-Interp
    Negative Logits
     Zheng
    -0.07
     Whilst
    -0.07
    Whilst
    -0.07
     zoo
    -0.07
    Fixture
    -0.07
     Thu
    -0.07
     whilst
    -0.06
    Planning
    -0.06
    Submission
    -0.06
    "If
    -0.06
    POSITIVE LOGITS
     America
    0.14
     American
    0.14
    American
    0.11
    America
    0.11
     amer
    0.11
     Americans
    0.10
     Americ
    0.09
     Amer
    0.09
    man
    0.09
     normals
    0.08
    Act Density 0.027%

    No Known Activations