INDEX
Explanations
phrases related to guidance, instructions, and following a specific path or set of rules
phrases related to following rules and instructions
New Auto-Interp
Negative Logits
tu
-0.80
etheless
-0.69
venge
-0.68
vu
-0.67
enf
-0.66
cest
-0.65
afety
-0.64
omnia
-0.64
pora
-0.63
adjusted
-0.63
POSITIVE LOGITS
footsteps
1.19
closely
1.02
steps
0.92
instructions
0.91
path
0.85
directions
0.81
route
0.81
blindly
0.78
guidelines
0.75
whims
0.73
Activations Density 0.178%