INDEX
    Explanations

    phrases related to the concept of 'outside' or external environment

    references to the concept of 'outside'

    New Auto-Interp
    Negative Logits
    ony
    -0.82
    enegger
    -0.79
    anwhile
    -0.75
    orah
    -0.75
    arrell
    -0.74
    hered
    -0.71
    onder
    -0.70
    oran
    -0.69
    ndra
    -0.67
    riages
    -0.66
    POSITIVE LOGITS
     observer
    0.95
     world
    0.79
     observers
    0.77
     linebackers
    0.75
    most
    0.74
     linebacker
    0.73
     hemisphere
    0.73
    world
    0.72
     Outs
    0.71
     appearance
    0.70
    Act Density 0.054%

    No Known Activations