INDEX
    Explanations

    phrases indicating ease or difficulty in performing actions

    New Auto-Interp
    Head Attr Weights
    0:0.10
    1:0.01
    2:0.10
    3:0.12
    4:0.31
    5:0.05
    6:0.02
    7:0.01
    8:0.04
    9:0.12
    10:0.04
    11:0.02
    Negative Logits
    Stars
    -1.28
    gmail
    -1.26
     cliff
    -1.26
    leased
    -1.20
    hya
    -1.15
    ached
    -1.15
    bon
    -1.12
    angel
    -1.12
    afety
    -1.12
     variance
    -1.11
    POSITIVE LOGITS
    than
    1.65
    Catalog
    1.41
    ildo
    1.37
    JUST
    1.29
     lug
    1.27
     Luigi
    1.24
     than
    1.22
     Snape
    1.20
    aline
    1.20
     navigating
    1.19
    Act Density 0.014%

    No Known Activations