INDEX
    Explanations

    statements related to authority figures or individuals speaking in a formal context

    New Auto-Interp
    Negative Logits
     Corner
    -0.15
     corner
    -0.14
     Wildlife
    -0.14
     Enemy
    -0.14
    åķı
    -0.14
    ask
    -0.13
    Ask
    -0.13
    entry
    -0.13
    .ask
    -0.13
    usher
    -0.13
    POSITIVE LOGITS
     added
    0.28
     said
    0.23
    added
    0.22
    -added
    0.21
    _added
    0.20
     Added
    0.19
    continued
    0.18
     ajout
    0.18
    Added
    0.17
     continued
    0.17
    Act Density 0.030%

    No Known Activations