INDEX
    Explanations

    phrases that describe locations or positions relative to other objects

    New Auto-Interp
    Negative Logits
     Anſ
    -0.68
    ✭✭
    -0.68
     houſe
    -0.68
     Efq
    -0.66
    ſelf
    -0.65
     Diſ
    -0.61
     Reſ
    -0.61
     jadx
    -0.58
     DZ
    -0.58
    ſelves
    -0.57
    POSITIVE LOGITS
    Beneath
    0.72
     Near
    0.70
    neath
    0.70
     near
    0.70
     Beneath
    0.68
     devant
    0.68
    Near
    0.66
     возле
    0.66
    LabelTagHelper
    0.62
     derrière
    0.61
    Act Density 0.228%

    No Known Activations