INDEX
Explanations
instances of the word "You" and related phrases indicating direct address to the reader
New Auto-Interp
Negative Logits
usher
-0.16
yny
-0.15
Uncomment
-0.14
urus
-0.14
numberWith
-0.14
undry
-0.14
浩
-0.14
asser
-0.13
rawer
-0.13
ÑĨенÑĤÑĢа
-0.13
POSITIVE LOGITS
said
0.20
seem
0.20
mentioned
0.18
forgot
0.18
seemed
0.18
seems
0.18
mentioned
0.18
mileage
0.17
could
0.17
stated
0.17
Activations Density 0.042%