INDEX
    Explanations

    the adjective "heavy" followed by a noun

    New Auto-Interp
    Negative Logits
     myſelf
    -1.11
     '\\;'
    -1.07
     Theſe
    -1.05
     ་་
    -1.02
     ―――――
    -0.99
     Efq
    -0.96
     itſelf
    -0.95
     Reſ
    -0.92
    $.
    
    -0.92
     ModelExpression
    -0.91
    POSITIVE LOGITS
     a
    0.69
     an
    0.68
    ↵↵
    0.67
    ;
    0.66
    n
    0.65
    <eos>
    0.65
    .
    0.63
     the
    0.63
     with
    0.62
    :
    0.59
    Act Density 0.441%

    No Known Activations