INDEX
    Explanations

    terms related to conflict or opposition, including descriptions of war crimes and political tensions

    New Auto-Interp
    Negative Logits
    Introduced
    -0.78
     nod
    -0.78
    ilk
    -0.70
    nce
    -0.69
     answer
    -0.66
    eus
    -0.64
    ghan
    -0.64
     thereof
    -0.63
    bie
    -0.63
    iments
    -0.62
    POSITIVE LOGITS
     sorts
    1.22
     theirs
    0.87
     attrition
    0.82
     Roses
    0.75
    course
    0.73
     hers
    0.73
    catch
    0.72
    ãĥĦ
    0.72
     ours
    0.71
     yours
    0.71
    Act Density 0.090%

    No Known Activations