INDEX
    Explanations

    possessions or characteristics associated with a specific entity

    New Auto-Interp
    Negative Logits
    eting
    -0.16
     sooner
    -0.16
    ØŃ
    -0.15
    ozy
    -0.14
    velt
    -0.14
    ey
    -0.14
    andbox
    -0.14
    aux
    -0.14
     Harden
    -0.14
    eval
    -0.13
    POSITIVE LOGITS
    orraine
    0.18
    ense
    0.17
    utter
    0.17
    Univers
    0.17
    ivers
    0.17
    abyrin
    0.17
     Univers
    0.17
    alin
    0.16
    ourd
    0.16
    ors
    0.16
    Act Density 0.019%

    No Known Activations