INDEX
    Explanations

    mentions of specific locations or contexts in stories

    New Auto-Interp
    Negative Logits
     has
    -0.50
     is
    -0.49
     ”
    -0.47
     as
    -0.45
     “
    -0.45
     for
    -0.43
     it
    -0.43
     {
    -0.42
     cons
    -0.42
    }';
    -0.41
    POSITIVE LOGITS
     lenker
    1.04
     houſe
    1.01
     ſmall
    0.91
     purpoſe
    0.90
     tartalomajánló
    0.89
     poffe
    0.89
     ſtate
    0.88
     Jefus
    0.87
    \{\\
    0.87
     NSCoder
    0.86
    Act Density 0.312%

    No Known Activations