INDEX
    Explanations

    words related to causation or logical conclusions

    the word "thus" and its various contexts of usage

    New Auto-Interp
    Negative Logits
    ten
    -0.69
     Kl
    -0.65
     Children
    -0.60
     Polo
    -0.60
     Ones
    -0.59
     Scott
    -0.58
     Leather
    -0.58
     Lobby
    -0.58
     Food
    -0.57
    track
    -0.57
    POSITIVE LOGITS
    forth
    0.88
    bered
    0.81
    forward
    0.79
     convol
    0.79
    è£ħ
    0.77
    mia
    0.77
     misunder
    0.76
     guiActiveUn
    0.75
    ãĤ´ãĥ³
    0.75
     far
    0.74
    Act Density 0.015%

    No Known Activations