INDEX
Explanations
sections of text that contain clinical or technical medical terminologies
Text followed by the token "Q" or "4"
Q: followed by questions
New Auto-Interp
Negative Logits
qu
-0.74
sa
-0.72
in
-0.71
الدراسه
-0.71
[…]
-0.71
to
-0.69
m
-0.69
der
-0.69
br
-0.69
bu
-0.68
POSITIVE LOGITS
pleaſure
1.10
myſelf
1.07
ſmall
0.98
leſs
0.97
Sarm
0.97
themſelves
0.97
itſelf
0.96
reaſon
0.95
purpoſe
0.95
ſame
0.95
Activations Density 0.010%