INDEX
Explanations
phrases related to health conditions or treatments
New Auto-Interp
Negative Logits
ſtate
-1.25
itſelf
-1.20
Shakspeare
-1.16
Houſe
-1.15
myſelf
-1.15
iſt
-1.14
themſelves
-1.12
Diſ
-1.12
Monfieur
-1.12
uſe
-1.11
POSITIVE LOGITS
↵↵
0.70
?
0.63
,
0.63
↵↵↵
0.59
.
0.57
:
0.57
«
0.56
<eos>
0.55
!
0.55
A
0.55
Activations Density 0.033%