INDEX
    Explanations

    references to specific numerical values or identifiers

    New Auto-Interp
    Negative Logits
     Efq
    -1.20
     Monfieur
    -1.02
     whoſe
    -1.02
     Jefus
    -0.98
     Eſ
    -0.95
     Theſe
    -0.91
     Houſe
    -0.85
     raiſ
    -0.85
     greateſt
    -0.85
     ―――――
    -0.84
    POSITIVE LOGITS
     I
    0.58
     i
    0.56
    <eos>
    0.55
     my
    0.54
    ↵↵
    0.49
     Mo
    0.46
    0.46
    0.46
    </h4>
    0.45
    </i>
    0.44
    Act Density 0.018%

    No Known Activations