INDEX
Explanations
phrases related to physical interactions such as touching, holding, or grabbing specific body parts
references to physical contact or violence
New Auto-Interp
Negative Logits
Reviewer
-0.59
Trends
-0.56
anish
-0.55
cumulative
-0.55
Atkinson
-0.53
astronauts
-0.53
occupants
-0.53
migrated
-0.52
relevance
-0.52
Surviv
-0.52
POSITIVE LOGITS
whom
0.82
because
0.75
while
0.73
whenever
0.70
ASAP
0.70
instead
0.70
lest
0.70
whilst
0.69
onstage
0.66
because
0.65
Activations Density 0.798%