INDEX
    Explanations

    phrases expressing identity and belonging

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.18
    3:0.07
    4:0.11
    5:0.03
    6:0.09
    7:0.07
    8:0.10
    9:0.04
    10:0.11
    11:0.11
    Negative Logits
    opened
    -1.78
    activated
    -1.71
    urned
    -1.62
    opped
    -1.61
    pressed
    -1.55
    reviewed
    -1.53
     Sparks
    -1.47
    onest
    -1.45
     CPC
    -1.44
     Downs
    -1.44
    POSITIVE LOGITS
    BILITY
    1.85
    ahime
    1.77
    iasm
    1.66
    Appearance
    1.62
    ...]
    1.59
     infer
    1.55
     unemploy
    1.55
    GW
    1.53
    ADS
    1.52
    ModLoader
    1.52
    Act Density 0.001%

    No Known Activations