INDEX
    Explanations

    phrases related to guiding the reader's attention or providing warnings

    New Auto-Interp
    Head Attr Weights
    0:0.04
    1:0.02
    2:0.06
    3:0.17
    4:0.10
    5:0.05
    6:0.14
    7:0.04
    8:0.06
    9:0.09
    10:0.09
    11:0.10
    Negative Logits
     buckets
    -1.55
     TTL
    -1.42
     Titanic
    -1.40
    luster
    -1.36
     Genie
    -1.29
     Bounty
    -1.28
     garbage
    -1.27
     bucket
    -1.27
     Boo
    -1.26
     Bucket
    -1.25
    POSITIVE LOGITS
    llor
    1.44
    cent
    1.40
    ghai
    1.40
    doi
    1.33
    ]-
    1.32
    ndra
    1.31
    iven
    1.31
    np
    1.31
    ior
    1.28
    ivot
    1.26
    Act Density 0.075%

    No Known Activations