INDEX
Explanations
references to historical medical practices and notable figures in medicine
New Auto-Interp
Negative Logits
190
-0.25
189
-0.23
191
-0.21
188
-0.21
192
-0.20
194
-0.17
telegram
-0.17
UnderTest
-0.17
193
-0.17
195
-0.16
POSITIVE LOGITS
176
0.43
177
0.42
178
0.40
179
0.40
175
0.36
174
0.35
173
0.34
Enlightenment
0.34
172
0.30
Enlight
0.26
Activations Density 0.288%