INDEX
    Explanations

    has not been explained as it seems like the conclusion to an instruction script and does not pertain to any attention head behavior in the initial context given. If you need further clarification or details regarding attention head behaviors in neural networks, please feel free to ask!

    New Auto-Interp
    Negative Logits
    _air
    -0.08
    Misc
    -0.07
    Hom
    -0.07
     azure
    -0.07
     efficiencies
    -0.07
     ach
    -0.07
    almost
    -0.07
    indlu
    -0.07
     admission
    -0.07
    اده
    -0.07
    POSITIVE LOGITS
     apostles
    0.08
     patri
    0.08
     Kathmandu
    0.08
     Clement
    0.08
     nursing
    0.08
     Ordin
    0.08
    läge
    0.08
     cultivating
    0.08
     Jem
    0.08
     contribut
    0.07
    Act Density 0.001%

    No Known Activations