INDEX
    Explanations

    phrases related to actions taken or decisions made by different entities or individuals

    New Auto-Interp
    Negative Logits
    ety
    -0.67
    etc
    -0.66
    orph
    -0.64
    Soc
    -0.63
    hack
    -0.63
    bery
    -0.63
    igious
    -0.61
    eth
    -0.58
    edy
    -0.58
     awa
    -0.57
    POSITIVE LOGITS
     hoped
    1.36
    iths
    1.14
     originally
    1.08
     planned
    1.03
     previously
    0.98
     initially
    0.92
     begun
    0.91
     been
    0.89
     anticipated
    0.88
     earlier
    0.87
    Act Density 0.179%

    No Known Activations