INDEX
    Explanations

    This neuron selectively activates on the adjective “favorable.”

    New Auto-Interp
    Negative Logits
     metres
    -0.08
     Archives
    -0.07
     scream
    -0.07
     secret
    -0.07
     corpses
    -0.07
    .Substring
    -0.07
    ropoda
    -0.06
     Screens
    -0.06
     loops
    -0.06
     inserts
    -0.06
    POSITIVE LOGITS
     unfavorable
    0.10
     favorable
    0.08
     favourable
    0.08
     unfavor
    0.07
    ULA
    0.07
    favor
    0.07
     TensorFlow
    0.07
     благодаря
    0.06
    ToUpper
    0.06
     flattering
    0.06
    Act Density 0.004%

    No Known Activations