INDEX
    Explanations

    proper names, particularly those of individuals involved in a legal context

    New Auto-Interp
    Negative Logits
    Ö¼
    -0.87
    IBLE
    -0.81
    llah
    -0.80
    utics
    -0.67
    iments
    -0.66
    د
    -0.66
    ential
    -0.66
    cipl
    -0.65
     encour
    -0.65
    encers
    -0.65
    POSITIVE LOGITS
    laus
    1.00
    won
    0.94
    lov
    0.93
    istani
    0.92
     Klux
    0.85
    lyak
    0.81
     DPR
    0.78
    erala
    0.77
    patrick
    0.77
    Äį
    0.76
    Act Density 0.073%

    No Known Activations