INDEX
    Explanations

    phrases that express strong opinions or feelings about a subject

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.03
    2:0.06
    3:0.05
    4:0.15
    5:0.02
    6:0.34
    7:0.11
    8:0.03
    9:0.02
    10:0.07
    11:0.06
    Negative Logits
    ////////////////////////////////
    -1.30
    ��
    -1.26
    edIn
    -1.25
    LOAD
    -1.24
    -1.24
    Loading
    -1.21
    ��
    -1.20
     Modes
    -1.19
     Schedule
    -1.18
    onomous
    -1.16
    POSITIVE LOGITS
    rison
    1.65
     Horowitz
    1.51
    aughs
    1.39
    nce
    1.35
    ipple
    1.35
     impression
    1.34
    ensical
    1.32
    onement
    1.32
    alks
    1.31
    arate
    1.31
    Act Density 0.040%

    No Known Activations