INDEX
Explanations
declarative statements followed by a comparison or contrast
occurrences of the word "None."
New Auto-Interp
Negative Logits
marking
-0.65
lif
-0.65
ideal
-0.60
guys
-0.60
planners
-0.59
hips
-0.58
resear
-0.58
agers
-0.58
tours
-0.58
rave
-0.58
POSITIVE LOGITS
None
3.55
None
2.54
none
1.92
none
1.79
Nothing
1.59
Neither
1.34
NULL
1.32
False
1.30
Nothing
1.22
TBA
1.18
Activations Density 0.010%