INDEX
    Explanations

    the word "mostly" and also has a weak association with the word "however"

    New Auto-Interp
    Negative Logits
     myſelf
    -1.73
     itſelf
    -1.63
     pleaſure
    -1.62
     Efq
    -1.60
     purpoſe
    -1.54
     raiſ
    -1.50
     houſe
    -1.48
     whoſe
    -1.45
     Anſ
    -1.44
     Theſe
    -1.41
    POSITIVE LOGITS
    ↵↵
    0.94
    er
    0.94
    ,
    0.91
     (
    0.91
      
    0.90
    s
    0.86
    e
    0.82
    <eos>
    0.81
     "
    0.81
    .
    0.80
    Act Density 1.560%

    No Known Activations