INDEX
    Explanations

    references to the word "fox" in various contexts

    New Auto-Interp
    Negative Logits
    Effective
    -0.88
    Rated
    -0.76
    itutional
    -0.71
    ETH
    -0.71
    affer
    -0.67
    Interstitial
    -0.66
    NT
    -0.66
    igious
    -0.66
    apter
    -0.63
    FINE
    -0.63
    POSITIVE LOGITS
    es
    1.02
     fox
    0.86
     squirrel
    0.86
    bat
    0.82
    manship
    0.82
    esy
    0.81
    hound
    0.81
    sey
    0.76
    fox
    0.76
    bench
    0.75
    Act Density 0.018%

    No Known Activations