INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.03
    2:0.16
    3:0.11
    4:0.06
    5:0.03
    6:0.21
    7:0.02
    8:0.02
    9:0.05
    10:0.14
    11:0.08
    Negative Logits
    afety
    -1.38
    saf
    -1.35
     eyebrows
    -1.27
     doubts
    -1.20
    oret
    -1.17
    ć
    -1.17
     suspicions
    -1.17
     testifying
    -1.15
    rums
    -1.14
    iasco
    -1.14
    POSITIVE LOGITS
    1.54
    amaz
    1.45
     Coins
    1.42
    1.42
    ]."
    1.37
    1.36
    merce
    1.30
    Premium
    1.29
    Interstitial
    1.29
    daq
    1.28
    Act Density 0.000%

    No Known Activations