INDEX
    Explanations

    the word "switch" with a high activation value

    instances of the word "switch."

    New Auto-Interp
    Negative Logits
    za
    -0.81
    apolis
    -0.72
    ORED
    -0.65
    kamp
    -0.65
    nutrition
    -0.64
    vez
    -0.63
    fare
    -0.62
    USD
    -0.62
    our
    -0.62
    zza
    -0.61
    POSITIVE LOGITS
    blade
    1.02
    grass
    1.01
     switch
    0.89
    aroo
    0.89
    gear
    0.85
    switch
    0.84
    backs
    0.83
     switches
    0.83
     gears
    0.76
    uese
    0.73
    Act Density 0.019%

    No Known Activations