INDEX
    Explanations

    phrases that express uncertainty or speculation about events and outcomes

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.08
    3:0.25
    4:0.14
    5:0.04
    6:0.07
    7:0.11
    8:0.04
    9:0.04
    10:0.06
    11:0.09
    Negative Logits
    76561
    -1.94
    Origin
    -1.55
    untarily
    -1.45
    src
    -1.40
    -1.40
    uci
    -1.37
     Quote
    -1.32
     Neuroscience
    -1.31
    ドラゴン
    -1.30
     Celt
    -1.30
    POSITIVE LOGITS
    ?'"
    1.82
    !?
    1.64
    !?"
    1.62
    ?!"
    1.60
    ?!
    1.58
    !'"
    1.52
    .'"
    1.49
    .<
    1.46
    .''.
    1.46
    .</
    1.44
    Act Density 0.001%

    No Known Activations