INDEX
Explanations
statements indicating concern or impact on a larger scale
statements about the status or condition of various situations and issues
New Auto-Interp
Negative Logits
inquire
-0.72
WRITE
-0.68
fishes
-0.66
eva
-0.64
Acqu
-0.63
CHO
-0.63
shuffle
-0.62
itely
-0.62
confidentiality
-0.61
Choose
-0.61
POSITIVE LOGITS
causing
1.20
sympt
1.18
fueling
1.15
harming
1.14
fue
1.14
exacerb
1.13
undermining
1.12
overshadow
1.10
prompting
1.07
impacting
1.06
Activations Density 0.197%