INDEX
    Explanations

    words related to physical actions or activities involving movement

    New Auto-Interp
    Negative Logits
    vale
    -0.22
    eus
    -0.20
    rophe
    -0.20
    ively
    -0.19
    ocks
    -0.19
    eza
    -0.18
    ome
    -0.17
    antly
    -0.17
    edException
    -0.16
    lide
    -0.16
    POSITIVE LOGITS
    bing
    0.57
    bed
    0.45
    bers
    0.42
    ging
    0.41
    ming
    0.40
    ting
    0.39
    ged
    0.37
    ber
    0.35
    by
    0.34
    ding
    0.34
    Act Density 0.111%

    No Known Activations