INDEX
    Explanations

    words related to taking or maintaining control

    New Auto-Interp
    Negative Logits
    ))[
    -0.46
     uits
    -0.45
     folha
    -0.44
     Hadd
    -0.44
     Morin
    -0.40
    ussch
    -0.40
     Byers
    -0.40
     Ebers
    -0.40
    <?
    
    -0.40
    }{*}{}
    -0.39
    POSITIVE LOGITS
    ſelf
    0.59
     Reſ
    0.58
     poffe
    0.55
     itſelf
    0.54
    ambilan
    0.54
    principalColumn
    0.53
     pleaſure
    0.52
     Conſ
    0.52
    WriteTagHelper
    0.52
     GenerationType
    0.51
    Act Density 0.009%

    No Known Activations