INDEX
Explanations
phrases related to causation or explanation using the word "because."
repetitive mentions of the word "the."
New Auto-Interp
Negative Logits
vernment
-0.76
dale
-0.67
advertising
-0.67
DB
-0.66
ira
-0.64
mares
-0.63
ancer
-0.62
ojure
-0.60
shaw
-0.59
edia
-0.58
POSITIVE LOGITS
result
1.35
same
1.31
same
1.21
culmination
1.13
ologically
1.12
ones
1.10
envy
1.09
equivalent
1.08
hardest
1.06
easiest
1.01
Activations Density 0.137%