INDEX
    Explanations

    specific names and influential figures in various contexts

    New Auto-Interp
    Negative Logits
    _taken
    -0.14
    λά
    -0.13
    Thrown
    -0.13
     risen
    -0.13
    swers
    -0.13
     lain
    -0.13
    리ì§Ģ
    -0.13
     spans
    -0.13
     theres
    -0.12
    thrown
    -0.12
    POSITIVE LOGITS
     was
    0.29
     did
    0.28
     had
    0.28
     gave
    0.26
     began
    0.26
     took
    0.26
     didn
    0.25
     has
    0.23
     came
    0.23
     went
    0.23
    Act Density 1.643%

    No Known Activations