INDEX
    Explanations

    phrases indicating importance, such as "the fate of" and "the history of"

    repeated phrases primarily containing the word "of."

    New Auto-Interp
    Negative Logits
    !.
    -0.77
    ,...
    -0.76
    .''.
    -0.65
    tackle
    -0.65
    .","
    -0.64
    !,
    -0.64
    .........
    -0.63
    ".[
    -0.63
    .[
    -0.62
    »
    -0.62
    POSITIVE LOGITS
    pires
    0.73
     these
    0.70
     varies
    0.62
     this
    0.61
    was
    0.60
     hasn
    0.60
    allows
    0.58
     translates
    0.57
     bothers
    0.56
    pired
    0.56
    Act Density 0.440%

    No Known Activations