INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.08
    2:0.08
    3:0.08
    4:0.09
    5:0.07
    6:0.08
    7:0.08
    8:0.07
    9:0.08
    10:0.07
    11:0.07
    Negative Logits
    jri
    -3.35
    anyahu
    -3.11
    -3.02
    DonaldTrump
    -2.91
     Erdogan
    -2.87
     "$:/
    -2.80
     Putin
    -2.78
    arnaev
    -2.76
    acan
    -2.75
    %"
    -2.72
    POSITIVE LOGITS
     reader
    2.72
     buckets
    2.69
     RV
    2.60
    opian
    2.51
     caregivers
    2.50
     hormones
    2.40
     Veg
    2.37
    ruciating
    2.36
     rabbit
    2.34
     hobby
    2.34
    Act Density 0.000%

    No Known Activations