INDEX
    Explanations

    occurrences of the word "hello."

    New Auto-Interp
    Negative Logits
    ftagPool
    -0.58
     Chwiliwch
    -0.54
     дописавши
    -0.48
    Parcelize
    -0.45
    存于互联网档案馆
    -0.43
     raiſ
    -0.42
    apore
    -0.42
    tisgarh
    -0.41
     leſs
    -0.41
     BoxFit
    -0.41
    POSITIVE LOGITS
     Hello
    1.61
     hello
    1.49
    Hello
    1.48
    hello
    1.30
     HelloWorld
    1.28
     HELLO
    1.24
    HELLO
    1.23
    HelloWorld
    1.08
    Greeting
    1.00
    helloworld
    0.95
    Act Density 0.005%

    No Known Activations