INDEX
Explanations
emphatic words and phrases indicating necessity or importance
strong recommendations or requirements
New Auto-Interp
Negative Logits
للمعارف
-0.64
برانيه
-0.61
GOTREF
-0.59
⟬
-0.56
AutoScale
-0.56
addContainerGap
-0.56
surla
-0.56
ModelExpression
-0.53
onoi
-0.52
KommentareTeilen
-0.52
POSITIVE LOGITS
MUST
0.46
絶対に
0.43
must
0.43
safety
0.42
SAFETY
0.40
!!!
0.40
kesin
0.39
MUST
0.38
absolutely
0.38
sizePolicy
0.38
Activations Density 0.019%