INDEX
    Explanations

    instances of the word "two" occurring at different numerical values

    New Auto-Interp
    Negative Logits
    rim
    -0.73
    rays
    -0.70
    urated
    -0.69
    tre
    -0.68
    rams
    -0.66
    andise
    -0.66
    ãĥ¤
    -0.65
    roots
    -0.65
    orius
    -0.65
    nect
    -0.65
    POSITIVE LOGITS
     thirds
    0.88
     dozen
    0.86
     hundred
    0.86
     glance
    0.80
     depending
    0.75
     ago
    0.74
     apiece
    0.74
     thousand
    0.73
     batches
    0.67
     beforehand
    0.67
    Act Density 0.021%

    No Known Activations