INDEX
Explanations
phrases related to waiting or delays in processes
New Auto-Interp
Negative Logits
91
-0.18
41
-0.18
94
-0.17
71
-0.17
83
-0.16
loat
-0.15
43
-0.15
87
-0.15
39
-0.15
79
-0.15
POSITIVE LOGITS
300
0.26
500
0.25
100
0.21
800
0.21
150
0.21
250
0.20
400
0.20
600
0.19
350
0.17
50
0.17
Activations Density 0.263%