INDEX
    Explanations

    phrases indicating emotional or subjective evaluations

    New Auto-Interp
    Head Attr Weights
    0:0.14
    1:0.08
    2:0.13
    3:0.04
    4:0.10
    5:0.11
    6:0.05
    7:0.02
    8:0.11
    9:0.09
    10:0.03
    11:0.05
    Negative Logits
     0004
    -1.60
     scent
    -1.59
     UNCLASSIFIED
    -1.50
    -1.42
    -1.40
     inning
    -1.40
    ovych
    -1.38
     vibe
    -1.38
    CLASSIFIED
    -1.35
     nerv
    -1.33
    POSITIVE LOGITS
    lambda
    1.93
    upload
    1.88
    trans
    1.87
    dr
    1.87
    their
    1.87
    suff
    1.85
    limits
    1.83
    split
    1.80
    fixed
    1.79
    them
    1.79
    Act Density 0.011%

    No Known Activations