INDEX
    Explanations

    phrases starting with the word "a"

    the indefinite articles "a" and "an"

    New Auto-Interp
    Negative Logits
    weights
    -0.72
     TDs
    -0.70
    orest
    -0.68
    osc
    -0.68
    antes
    -0.68
    ores
    -0.67
    ometers
    -0.66
    onto
    -0.66
     favourites
    -0.66
    itiz
    -0.65
    POSITIVE LOGITS
     nutshell
    1.38
     twist
    0.90
     perverse
    0.87
     contradiction
    0.81
     brief
    0.80
     typical
    0.79
     footnote
    0.79
     statement
    0.79
     flurry
    0.79
     bizarre
    0.78
    Act Density 0.070%

    No Known Activations