INDEX
    Explanations

    phrases that indicate actions or accusations related to individuals or groups

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.03
    2:0.07
    3:0.07
    4:0.02
    5:0.03
    6:0.05
    7:0.05
    8:0.06
    9:0.27
    10:0.12
    11:0.17
    Negative Logits
    Comments
    -1.30
    Pokémon
    -1.26
    Size
    -1.25
     greets
    -1.21
    Discussion
    -1.21
    FILE
    -1.16
     unfolds
    -1.15
    ISE
    -1.14
    Ball
    -1.14
    oku
    -1.14
    POSITIVE LOGITS
     wounding
    1.48
     distortion
    1.35
     cannibal
    1.34
    grave
    1.30
     sabot
    1.30
     killing
    1.30
     gou
    1.26
     VAT
    1.25
     distortions
    1.18
     inflicting
    1.17
    Act Density 0.011%

    No Known Activations