INDEX
Explanations
phrases mentioning the actions or opinions of people
repeated usage of the pronoun "they."
New Auto-Interp
Negative Logits
Globe
-0.75
Eleven
-0.74
Kinn
-0.66
Electrical
-0.64
Column
-0.62
Claire
-0.61
Bates
-0.60
////////////////////////////////
-0.60
Dome
-0.60
amia
-0.60
POSITIVE LOGITS
're
1.14
've
1.06
'd
1.05
selves
0.97
'll
0.95
stanbul
0.90
encount
0.90
could
0.85
self
0.85
selves
0.83
Activations Density 0.160%