INDEX
Explanations
phrases related to causality and logical connections
New Auto-Interp
Negative Logits
Yep
-1.03
Really
-0.88
Seriously
-0.83
Enlarge
-0.82
Thumbnail
-0.81
Pretty
-0.80
atron
-0.79
Yeah
-0.78
Wait
-0.78
Nope
-0.77
POSITIVE LOGITS
deviations
1.04
embodiments
1.00
considerable
0.91
however
0.89
there
0.89
we
0.87
implementations
0.86
although
0.85
excessive
0.85
heterogeneity
0.85
Activations Density 0.337%