INDEX
    Explanations

    mentions of powerlessness and existential themes in storytelling

    New Auto-Interp
    Negative Logits
    arda
    -0.15
    egra
    -0.15
    zew
    -0.15
     Dirt
    -0.14
    ibs
    -0.14
    ysa
    -0.14
    ille
    -0.14
    aware
    -0.14
    obar
    -0.13
    oup
    -0.13
    POSITIVE LOGITS
     both
    0.52
     Both
    0.50
    both
    0.48
    Both
    0.46
     BOTH
    0.45
     beide
    0.43
     ambos
    0.40
    _both
    0.39
    _BOTH
    0.35
     обо
    0.34
    Act Density 0.375%

    No Known Activations