INDEX
    Explanations

    the word "We" to indicate collective pronouns or references to a group

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.06
    3:0.12
    4:0.15
    5:0.03
    6:0.13
    7:0.20
    8:0.04
    9:0.06
    10:0.04
    11:0.09
    Negative Logits
     ratios
    -1.50
     downs
    -1.46
    zers
    -1.40
    edom
    -1.36
    olesc
    -1.34
     euth
    -1.31
    wolves
    -1.28
    apo
    -1.26
    hattan
    -1.25
     Tsukuyomi
    -1.25
    POSITIVE LOGITS
    itely
    1.46
    Movie
    1.37
    Ancient
    1.29
    Record
    1.28
    aux
    1.27
    Old
    1.24
    audio
    1.23
    iverse
    1.23
    Text
    1.22
    Subscribe
    1.22
    Act Density 0.001%

    No Known Activations