INDEX
    Explanations

    connections between events and their consequences or details within a narrative context

    New Auto-Interp
    Negative Logits
    ische
    -0.23
    isches
    -0.21
     schöne
    -0.21
     ganze
    -0.20
    die
    -0.19
     gute
    -0.19
     Ihre
    -0.18
     our
    -0.18
     виÑıв
    -0.18
     erste
    -0.17
    POSITIVE LOGITS
     einem
    0.40
     dem
    0.39
     den
    0.38
     der
    0.36
     einer
    0.36
     diesem
    0.30
     seinem
    0.29
     ihrem
    0.28
     denen
    0.27
     allen
    0.25
    Act Density 0.029%

    No Known Activations