INDEX
Explanations
instances of summary or concise statements in text
New Auto-Interp
Negative Logits
WithDuration
-0.18
Niet
-0.17
pNet
-0.15
SUBSTITUTE
-0.13
IMIT
-0.13
uber
-0.13
伯
-0.13
matchmaking
-0.13
verse
-0.13
ob
-0.13
POSITIVE LOGITS
_nat
0.17
enance
0.16
ERGE
0.16
enton
0.16
arnation
0.15
ynam
0.15
ugar
0.15
ckt
0.15
ucid
0.14
ootball
0.14
Activations Density 0.010%