INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.01
    2:0.05
    3:0.10
    4:0.04
    5:0.04
    6:0.02
    7:0.46
    8:0.06
    9:0.02
    10:0.03
    11:0.04
    Negative Logits
    Reviewer
    -2.87
     objectionable
    -2.56
     clouds
    -2.34
     differ
    -2.27
    Neither
    -2.25
     Both
    -2.23
     adversary
    -2.23
    Both
    -2.22
     intrusive
    -2.17
     capitals
    -2.16
    POSITIVE LOGITS
    nen
    2.66
     Lyme
    2.53
     subsequ
    2.44
     Healthy
    2.43
    abase
    2.41
    orers
    2.33
    lished
    2.32
     Fitness
    2.29
    eport
    2.23
     Ware
    2.22
    Act Density 0.001%

    No Known Activations