INDEX
Explanations
references to data metrics and evaluation criteria in a given context
New Auto-Interp
Negative Logits
)↵
-0.38
)↵↵
-0.37
]↵
-0.32
")↵
-0.32
)
-0.31
")↵
-0.30
))↵
-0.30
]↵↵
-0.29
_)↵
-0.29
")
-0.28
POSITIVE LOGITS
}.
0.38
).
0.32
}.↵
0.32
!).
0.30
}.
0.29
].
0.29
).č↵
0.29
».
0.29
'].
0.28
/>.
0.28
Activations Density 0.304%