INDEX
    Explanations

    terms related to allegations and claims

    New Auto-Interp
    Negative Logits
    icles
    -0.16
    ugs
    -0.16
    riz
    -0.15
    .scalablytyped
    -0.15
    ozici
    -0.15
    еÑĨ
    -0.15
    ocular
    -0.14
    еле
    -0.14
    Äĩi
    -0.14
     ç±
    -0.14
    POSITIVE LOGITS
    edly
    0.28
    orical
    0.28
    iances
    0.26
     Alleg
    0.24
    ory
    0.24
    iance
    0.23
    ged
    0.21
    ato
    0.20
     alleg
    0.19
    ories
    0.18
    Act Density 0.005%

    No Known Activations