INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     telev
    -0.07
     fruitful
    -0.07
     Stop
    -0.07
     gonna
    -0.07
     kuvvet
    -0.07
    -0.07
    Spot
    -0.07
    -song
    -0.06
    -0.06
    ;set
    -0.06
    POSITIVE LOGITS
     adher
    0.10
     adhere
    0.09
     adherence
    0.08
    devices
    0.07
    aders
    0.07
    Refer
    0.07
    herent
    0.06
    Neither
    0.06
     Responses
    0.06
     addresses
    0.06
    Act Density 0.004%

    No Known Activations