INDEX
    Explanations

    mentions of the word "Horse" with varying levels of activation

    references to "Horse" and related terms

    New Auto-Interp
    Negative Logits
     questioning
    -0.69
     sil
    -0.66
    reens
    -0.66
     middle
    -0.64
    theless
    -0.63
     fortune
    -0.63
     mistrust
    -0.63
     cursing
    -0.62
     tut
    -0.62
     semantic
    -0.62
    POSITIVE LOGITS
     Horse
    3.75
     Horses
    1.59
    horse
    1.28
     Elephant
    1.26
     Bunny
    1.13
     Goat
    1.13
     Cobra
    1.11
     Legs
    1.05
     Sheep
    1.03
     Toad
    1.03
    Act Density 0.029%

    No Known Activations