INDEX
    Explanations

    attends to tokens denoting specific grammatical relationships from tokens that follow them

    New Auto-Interp
    Head Attr Weights
    0:0.12
    1:0.14
    2:0.10
    3:0.09
    4:0.14
    5:0.15
    6:0.10
    7:0.12
    Negative Logits
     ſtate
    -0.38
     ftate
    -0.36
     referenties
    -0.35
     myſelf
    -0.35
     themſelves
    -0.35
     Houſe
    -0.34
     pleaſure
    -0.33
    المناصب
    -0.33
     uſe
    -0.32
     purpoſe
    -0.32
    POSITIVE LOGITS
    getMenuInflater
    0.26
    ConstraintMaker
    0.26
    createCell
    0.25
    kamers
    0.25
    ArgumentParser
    0.23
    PreferredItem
    0.23
    SpringBootTest
    0.23
    جة
    0.23
     nœ
    0.22
    mazaki
    0.22
    Act Density 0.034%

    No Known Activations