INDEX
Explanations
references to historical figures or events related to medical advancements
New Auto-Interp
Negative Logits
190
-0.26
189
-0.24
191
-0.23
188
-0.23
187
-0.20
192
-0.19
nict
-0.19
193
-0.18
Alfred
-0.17
telegram
-0.17
POSITIVE LOGITS
176
0.47
177
0.42
175
0.42
174
0.41
178
0.41
179
0.38
173
0.36
172
0.33
Enlightenment
0.31
171
0.29
Activations Density 0.146%