INDEX
    Explanations

    references to personal relationships and familial connections

    New Auto-Interp
    Negative Logits
    HAS
    -0.27
    Were
    -0.27
     Were
    -0.26
     Has
    -0.26
    —are
    -0.25
    _are
    -0.24
    .are
    -0.23
     hanno
    -0.23
    _has
    -0.23
     aren
    -0.23
    POSITIVE LOGITS
     was
    0.40
     wasn
    0.30
     became
    0.27
     could
    0.25
     couldn
    0.25
    was
    0.24
     took
    0.23
     began
    0.23
     had
    0.22
     would
    0.22
    Act Density 0.512%

    No Known Activations