INDEX
Explanations
sentences that contain instructions or recommendations
New Auto-Interp
Negative Logits
原始内容存档于
-0.75
Çünkü
-0.72
SourceChecksum
-0.70
}$
-0.68
发表于
-0.67
Theſe
-0.66
transQ
-0.66
GenerationType
-0.65
Monfieur
-0.64
]")]
-0.63
POSITIVE LOGITS
Please
0.67
please
0.66
Please
0.63
ⓧ
0.59
Check
0.59
bitte
0.59
check
0.57
please
0.55
请
0.55
是非
0.53
Activations Density 0.371%