INDEX
    Explanations

    concepts related to comparison and quantities

    New Auto-Interp
    Head Attr Weights
    0:0.04
    1:0.01
    2:0.10
    3:0.08
    4:0.30
    5:0.03
    6:0.06
    7:0.12
    8:0.05
    9:0.04
    10:0.07
    11:0.05
    Negative Logits
     surrog
    -1.51
     fluct
    -1.46
     privately
    -1.45
     quietly
    -1.42
     aloud
    -1.42
     unfold
    -1.40
     alternating
    -1.39
     rand
    -1.39
     muted
    -1.37
     softly
    -1.37
    POSITIVE LOGITS
    trak
    1.68
    coming
    1.66
    furt
    1.61
    vana
    1.59
    cit
    1.58
    cross
    1.55
    iership
    1.48
    abol
    1.47
    icket
    1.47
    DonaldTrump
    1.44
    Act Density 0.001%

    No Known Activations