INDEX
    Explanations

    references to stories and truths, particularly in contrasting contexts

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.16
    3:0.07
    4:0.22
    5:0.03
    6:0.04
    7:0.16
    8:0.04
    9:0.03
    10:0.09
    11:0.07
    Negative Logits
    inance
    -1.36
    iggins
    -1.32
    merga
    -1.29
    kin
    -1.27
    annis
    -1.25
    aird
    -1.25
    ibu
    -1.22
    masters
    -1.21
    inence
    -1.19
    BILITIES
    -1.19
    POSITIVE LOGITS
    alogy
    1.42
     unheard
    1.35
     excerpts
    1.34
     aloud
    1.30
    except
    1.30
     arcs
    1.27
     anecdotes
    1.27
     Painter
    1.25
     scenes
    1.25
     format
    1.24
    Act Density 0.008%

    No Known Activations