INDEX
Explanations
instances of conditional statements
references to the word "that" in various contexts
New Auto-Interp
Negative Logits
apolis
-0.86
hops
-0.82
busters
-0.82
ainers
-0.81
AMS
-0.79
hens
-0.78
okers
-0.77
Roose
-0.77
rypt
-0.77
inces
-0.76
POSITIVE LOGITS
translates
1.02
happens
0.98
entails
0.96
person
0.96
distinction
0.95
determination
0.95
same
0.93
pesky
0.93
particular
0.90
knowledge
0.87
Activations Density 0.130%