INDEX
    Explanations

    references to overarching concepts or themes

    New Auto-Interp
    Negative Logits
    zan
    -0.17
    ulse
    -0.16
    ëĭµ
    -0.16
    het
    -0.15
    jen
    -0.15
    ists
    -0.15
    light
    -0.14
    quelle
    -0.14
    coni
    -0.14
    ube
    -0.14
    POSITIVE LOGITS
     thing
    0.30
    heart
    0.28
     entire
    0.25
    -hearted
    0.23
     ench
    0.21
    thing
    0.20
    -sale
    0.20
     Thing
    0.20
    /part
    0.19
    meal
    0.19
    Act Density 0.020%

    No Known Activations