INDEX
    Explanations

    legal and policy-related terms and concepts

    New Auto-Interp
    Negative Logits
    borough
    -0.80
    bard
    -0.78
    bage
    -0.72
    xon
    -0.71
    ko
    -0.70
    kind
    -0.69
    kaya
    -0.69
    boy
    -0.68
    ascus
    -0.68
    been
    -0.65
    POSITIVE LOGITS
     us
    1.01
     unrestricted
    0.85
     users
    0.83
     withdrawals
    0.80
     experimentation
    0.79
    Reviewer
    0.79
     access
    0.78
     rapists
    0.78
     passers
    0.76
     me
    0.76
    Act Density 0.575%

    No Known Activations