INDEX
    Explanations

    words related to reasoning or cause and effect

    instances of the word "thus" as a connector in sentences

    New Auto-Interp
    Negative Logits
     Kl
    -0.69
     Ones
    -0.64
     Kelvin
    -0.61
    kick
    -0.61
     Polo
    -0.59
    Don
    -0.59
    ertodd
    -0.59
     MPH
    -0.58
     Coffee
    -0.58
    ropolitan
    -0.58
    POSITIVE LOGITS
    forth
    1.14
    forward
    0.85
    bered
    0.84
    mia
    0.84
    mask
    0.79
    othe
    0.76
     far
    0.75
     guiActiveUn
    0.74
    aper
    0.73
    hiba
    0.73
    Act Density 0.026%

    No Known Activations