INDEX
Explanations
command instructions
explicit meta-instructions about how to respond, especially prohibitions, formatting requirements, and directives to list items or provide examples.
New Auto-Interp
Negative Logits
Wildlife
0.37
líquidos
0.36
an
0.34
基于
0.33
Bone
0.32
Plastics
0.32
ক্যান্সার
0.32
Bikini
0.31
Skin
0.31
Muscle
0.31
POSITIVE LOGITS
גם
0.45
meno
0.45
murderous
0.43
ezek
0.42
ਓ
0.42
هذا
0.41
nonchal
0.41
ᕈ
0.41
disdain
0.41
concom
0.40
Activations Density 0.397%