INDEX
Explanations
instances of the word "In" used to indicate the beginning of statements or clauses
New Auto-Interp
Negative Logits
duct
-0.23
sofar
-0.23
odore
-0.20
spite
-0.17
wards
-0.17
ducted
-0.17
/by
-0.17
behalf
-0.16
depend
-0.16
adays
-0.16
POSITIVE LOGITS
addition
0.30
additions
0.24
contrast
0.23
Addition
0.23
add
0.23
added
0.22
essence
0.21
contrast
0.21
sum
0.20
reality
0.20
Activations Density 0.164%