INDEX
    Explanations

    expressive language that emphasizes determination or insistence on a particular point

    New Auto-Interp
    Negative Logits
     []:
    -0.51
    istoitu
    -0.50
     (
    -0.49
     rang
    -0.49
    ,
    -0.48
    </td>
    -0.48
     in
    -0.47
    <eos>
    -0.47
     had
    -0.47
     <
    -0.46
    POSITIVE LOGITS
     pleaſure
    1.09
     Experiment
    1.06
     myſelf
    1.04
     itſelf
    1.04
    experiment
    1.03
    Experiment
    0.99
     expériment
    0.99
     purpoſe
    0.98
     Monfieur
    0.97
     ſtate
    0.96
    Act Density 0.089%

    No Known Activations