INDEX
    Explanations

    phrases that denote the presence of a subject or entity, often at the beginning of sentences

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.03
    2:0.06
    3:0.21
    4:0.07
    5:0.05
    6:0.17
    7:0.03
    8:0.06
    9:0.09
    10:0.09
    11:0.05
    Negative Logits
     quir
    -1.34
     hotly
    -1.25
     separately
    -1.21
     withd
    -1.20
     latter
    -1.17
     accordingly
    -1.16
     plunge
    -1.15
     stricken
    -1.13
     snowy
    -1.12
     subsequent
    -1.12
    POSITIVE LOGITS
    ")
    2.63
    "]
    2.55
    "),
    2.55
    "))
    2.53
    .")
    2.34
    "?
    2.22
    "],
    2.19
    ").
    2.19
     …"
    2.18
    ");
    2.15
    Act Density 0.048%

    No Known Activations