INDEX
    Explanations

    items in a list

    New Auto-Interp
    Negative Logits
    birth
    -0.69
    displayText
    -0.60
    BUS
    -0.58
     Aber
    -0.58
    IL
    -0.57
    wav
    -0.56
     Equality
    -0.55
    forth
    -0.55
     lat
    -0.55
    perty
    -0.55
    POSITIVE LOGITS
    ening
    1.35
    eners
    1.11
    ened
    1.06
    ener
    1.03
    ing
    0.90
    erv
    0.80
    ensen
    0.78
    enhagen
    0.74
    geist
    0.74
    enf
    0.74
    Act Density 5.105%

    No Known Activations