INDEX
    Explanations

    references to individuals or groups and their relationships

    New Auto-Interp
    Negative Logits
    -0.29
     (“
    -0.26
    -0.26
    ’S
    -0.24
    ’ll
    -0.24
    ’re
    -0.23
     ï
    -0.23
    ’m
    -0.22
    ’ve
    -0.22
     “[
    -0.22
    POSITIVE LOGITS
    ;s
    0.27
    's
    0.27
    'a
    0.26
    ;'
    0.25
    "'
    0.25
    '[
    0.25
    %'
    0.22
    _'
    0.22
    ()'
    0.22
    *'
    0.22
    Act Density 0.181%

    No Known Activations