INDEX
    Explanations

    phrases indicating justification or excuses for behavior

    New Auto-Interp
    Negative Logits
    kj
    -0.15
    ÙĤÙī
    -0.14
    .fhir
    -0.14
    ersiz
    -0.14
    eous
    -0.14
    aired
    -0.14
    verbatim
    -0.14
     trú
    -0.14
    .AnchorStyles
    -0.13
    eec
    -0.13
    POSITIVE LOGITS
     measure
    0.29
     respect
    0.28
     accounts
    0.27
     sense
    0.27
     regards
    0.27
     stretch
    0.26
     extent
    0.25
     respects
    0.25
     measures
    0.25
     degree
    0.24
    Act Density 0.048%

    No Known Activations