INDEX
Explanations
phrases related to user actions and interactions with technology
New Auto-Interp
Negative Logits
\views
-0.16
329
-0.15
699
-0.14
-alist
-0.14
hod
-0.14
599
-0.14
Ñģм
-0.13
clipped
-0.13
593
-0.13
Liberation
-0.13
POSITIVE LOGITS
prompt
0.32
directed
0.30
prompt
0.30
prompted
0.28
directing
0.27
Prompt
0.27
prompting
0.27
Prompt
0.26
prompts
0.26
Directed
0.25
Activations Density 0.105%