INDEX
    Explanations

    phrases related to physical interactions such as touching, holding, or grabbing specific body parts

    references to physical contact or violence

    New Auto-Interp
    Negative Logits
    Reviewer
    -0.59
     Trends
    -0.56
    anish
    -0.55
     cumulative
    -0.55
     Atkinson
    -0.53
     astronauts
    -0.53
     occupants
    -0.53
     migrated
    -0.52
     relevance
    -0.52
     Surviv
    -0.52
    POSITIVE LOGITS
     whom
    0.82
     because
    0.75
     while
    0.73
     whenever
    0.70
     ASAP
    0.70
     instead
    0.70
     lest
    0.70
     whilst
    0.69
     onstage
    0.66
    because
    0.65
    Act Density 0.798%

    No Known Activations