INDEX
    Explanations

    phrases indicating decisive actions and outcomes

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.09
    3:0.14
    4:0.02
    5:0.04
    6:0.05
    7:0.11
    8:0.07
    9:0.19
    10:0.06
    11:0.15
    Negative Logits
    Smith
    -1.10
    YR
    -1.09
    LER
    -1.08
    olini
    -1.05
     Jindal
    -1.03
    ーク
    -1.02
    ーティ
    -0.98
    gets
    -0.97
    arse
    -0.96
    Daily
    -0.95
    POSITIVE LOGITS
     sealing
    1.44
     seal
    1.42
    rity
    1.28
    antha
    1.25
     envelop
    1.24
    keye
    1.14
     uter
    1.13
     sealed
    1.12
     borders
    1.11
     secrecy
    1.10
    Act Density 0.005%

    No Known Activations