INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.11
    3:0.06
    4:0.22
    5:0.05
    6:0.03
    7:0.22
    8:0.04
    9:0.04
    10:0.08
    11:0.07
    Negative Logits
    phia
    -1.36
    cca
    -1.30
    SHA
    -1.29
    ppa
    -1.24
    lihood
    -1.24
    bows
    -1.19
    SHIP
    -1.18
    Merit
    -1.16
    anus
    -1.15
    iflower
    -1.15
    POSITIVE LOGITS
     mism
    1.38
     favourable
    1.34
     knowledgeable
    1.32
     teams
    1.27
     clues
    1.25
     volunteers
    1.24
     specific
    1.24
     perspectives
    1.23
     diagrams
    1.21
     brackets
    1.20
    Act Density 0.001%

    No Known Activations