INDEX
    Explanations

    references to fairness, justice, and equality

    New Auto-Interp
    Negative Logits
    PROV
    -0.81
    aneous
    -0.70
    arij
    -0.66
     Norn
    -0.65
    ATED
    -0.65
    Assembly
    -0.63
    pta
    -0.61
    ulous
    -0.61
    PE
    -0.58
    odied
    -0.58
    POSITIVE LOGITS
     Weasley
    0.93
    ships
    0.88
    fort
    0.85
    iton
    0.84
    lyn
    0.84
    nell
    0.83
    Ñĭ
    0.80
    itudes
    0.79
    ies
    0.79
    \\\\\\\\
    0.79
    Act Density 3.335%

    No Known Activations