INDEX
    Explanations

    references to comparisons and relationships between entities or concepts

    New Auto-Interp
    Negative Logits
    Been
    -0.15
    eyin
    -0.15
    ynos
    -0.14
    radient
    -0.14
    oi
    -0.13
    iets
    -0.13
    ãģ£ãģį
    -0.13
    átis
    -0.13
    ØŃØ«
    -0.13
    øy
    -0.13
    POSITIVE LOGITS
     does
    0.89
     did
    0.88
    does
    0.73
     do
    0.71
    did
    0.68
     Does
    0.66
     Did
    0.58
    Did
    0.56
    Does
    0.56
     DOES
    0.55
    Act Density 0.195%

    No Known Activations