INDEX
    Explanations

    StableLM, Google DeepMind

    New Auto-Interp
    Negative Logits
     Norwalk
    1.46
     Livermore
    1.30
     Ketch
    1.28
     FXR
    1.20
     Anaheim
    1.19
     Bridgeport
    1.16
    Connecticut
    1.14
     Warrington
    1.13
     haystack
    1.13
     Connecticut
    1.12
    POSITIVE LOGITS
     Afrika
    1.51
     Durban
    1.44
     Johannesburg
    1.44
     apartheid
    1.43
    Afrika
    1.43
     Mandela
    1.42
     South
    1.40
     Gauteng
    1.37
     Zulu
    1.35
     Cape
    1.32
    Act Density 0.306%

    No Known Activations