INDEX
    Explanations

    references to physical walls

    references to physical walls

    New Auto-Interp
    Negative Logits
    Gene
    -0.79
    amate
    -0.68
    CENT
    -0.67
    ptive
    -0.67
    lihood
    -0.66
    forward
    -0.66
    milo
    -0.65
    munition
    -0.64
    phrine
    -0.62
    ×ķ
    -0.62
    POSITIVE LOGITS
    papers
    1.17
    abies
    1.14
    aby
    1.08
     wall
    0.92
    clock
    0.89
     walls
    0.89
    top
    0.85
     wart
    0.84
    tops
    0.82
     thickness
    0.82
    Act Density 0.012%

    No Known Activations