INDEX
    Explanations

    personal pronouns and common verbs, indicating a focus on personal experiences and interactions

    New Auto-Interp
    Negative Logits
     unthinkable
    -0.15
     chois
    -0.14
    entlich
    -0.14
    xDE
    -0.14
    burgh
    -0.14
     choosing
    -0.13
    620
    -0.13
    090
    -0.13
    lectual
    -0.13
     lá»±a
    -0.13
    POSITIVE LOGITS
     curiosity
    0.27
     curious
    0.23
     learn
    0.22
     learns
    0.22
     learning
    0.21
     discover
    0.21
     Discover
    0.20
     lear
    0.20
     discovery
    0.20
     Cur
    0.19
    Act Density 0.011%

    No Known Activations