INDEX
Explanations
phrases relating to requests or instructions
New Auto-Interp
Negative Logits
divor
-0.58
iolet
-0.57
apor
-0.56
satur
-0.56
segreg
-0.55
flying
-0.54
blat
-0.54
redo
-0.54
quadru
-0.53
appropri
-0.53
POSITIVE LOGITS
of
0.86
thereof
0.74
OF
0.74
aeper
0.72
Of
0.72
OF
0.71
oft
0.68
whims
0.67
forts
0.65
liest
0.65
Activations Density 0.207%