INDEX
    Explanations

    the word "flash" at various activations

    New Auto-Interp
    Negative Logits
    employed
    -0.79
    lain
    -0.76
    thy
    -0.74
    nil
    -0.74
    mens
    -0.73
    avia
    -0.71
     Guth
    -0.71
     Kron
    -0.68
     Cohn
    -0.68
     Colo
    -0.68
    POSITIVE LOGITS
     flash
    3.83
    flash
    2.87
    Flash
    2.45
     Flash
    2.39
     flashes
    2.28
     flashed
    2.03
     flashing
    1.94
     flashlight
    1.55
     blink
    1.46
     flashback
    1.33
    Act Density 0.015%

    No Known Activations