INDEX
Explanations
assertions and validation checks in code
New Auto-Interp
Negative Logits
Howard
-0.15
par
-0.15
levance
-0.15
chooser
-0.15
indeed
-0.14
tab
-0.14
post
-0.14
508
-0.13
,
-0.13
r
-0.13
POSITIVE LOGITS
deep
0.41
Deep
0.38
deep
0.36
Deep
0.36
_deep
0.35
.deep
0.32
æ·±
0.31
deepest
0.26
глÑĥб
0.25
æ·±
0.25
Activations Density 0.007%