INDEX
    Explanations

    location relative to objects

    New Auto-Interp
    Negative Logits
     destroys
    -1.09
     =(
    -1.01
    ofition
    -0.98
     /=
    -0.98
     xiv
    -0.93
    -0.93
     steers
    -0.91
    花の
    -0.91
    -0.91
    ↵↵↵↵↵↵↵↵↵↵↵
    -0.91
    POSITIVE LOGITS
     overhead
    1.22
     što
    1.01
     comprehensive
    0.97
     akal
    0.92
     from
    0.92
     ginge
    0.91
     matang
    0.90
     height
    0.90
     full
    0.88
    никами
    0.87
    Act Density 0.013%

    No Known Activations