INDEX
Explanations
commands or instructions
promotional phrases or instructions related to product addition and offers
New Auto-Interp
Negative Logits
Hua
-0.84
ISIL
-0.78
Goldstein
-0.77
Cohn
-0.72
Jill
-0.71
ca
-0.70
ãĤ§
-0.68
Stella
-0.68
Wilson
-0.68
Elsa
-0.67
POSITIVE LOGITS
4
1.52
4
1.49
Four
1.15
four
1.14
four
1.11
Four
1.09
Fourth
0.96
Fourth
0.95
fourth
0.93
IV
0.92
Activations Density 0.234%