INDEX
    Explanations

    numeric values related to sentences or passages in a legal or news context

    phrases related to legal actions and consequences

    New Auto-Interp
    Head Attr Weights
    0:0.05
    1:0.01
    2:0.16
    3:0.04
    4:0.25
    5:0.04
    6:0.02
    7:0.01
    8:0.23
    9:0.07
    10:0.04
    11:0.01
    Negative Logits
    rouse
    -1.31
    ignant
    -1.30
    pir
    -1.29
    enegger
    -1.27
    PRESS
    -1.22
    ergy
    -1.16
    -1.16
    QL
    -1.15
    eff
    -1.15
     Squirrel
    -1.14
    POSITIVE LOGITS
     Pastebin
    1.32
     bin
    1.28
    umbn
    1.27
    iths
    1.24
    washing
    1.23
    lain
    1.21
     mats
    1.19
    ppe
    1.16
     Bless
    1.15
    geon
    1.14
    Act Density 0.006%

    No Known Activations