INDEX
    Explanations

    phrases indicating motivation or influence behind actions

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.07
    3:0.06
    4:0.13
    5:0.04
    6:0.03
    7:0.34
    8:0.05
    9:0.03
    10:0.07
    11:0.10
    Negative Logits
    jud
    -1.53
    Redditor
    -1.52
    fing
    -1.49
    pants
    -1.45
    -1.45
    onyms
    -1.41
    ummer
    -1.41
    -1.41
    nick
    -1.39
    dden
    -1.39
    POSITIVE LOGITS
    andise
    1.64
     conver
    1.50
     developments
    1.50
     obs
    1.48
     wedge
    1.48
     dwar
    1.42
     Wer
    1.42
     funnel
    1.40
     consolidation
    1.38
     movements
    1.36
    Act Density 0.001%

    No Known Activations