INDEX
Explanations
commands or instructions listed in a structured format
phrases that indicate guidelines or protocols
New Auto-Interp
Negative Logits
minist
-0.80
unts
-0.77
Newsletter
-0.73
aukee
-0.73
cest
-0.70
obyl
-0.69
Ott
-0.69
izont
-0.68
opter
-0.67
wr
-0.67
POSITIVE LOGITS
:-
0.96
:"
0.93
:{0.87
>:
0.86
:]
0.85
follows
0.82
bourg
0.80
:
0.75
:#
0.70
:(
0.69
Activations Density 0.014%