INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     unreachable
    -0.08
    Affordable
    -0.07
     zoals
    -0.07
     såsom
    -0.07
    arele
    -0.07
     apology
    -0.07
    $a
    -0.07
     AFL
    -0.07
     folklore
    -0.07
     affordable
    -0.07
    POSITIVE LOGITS
     perpendicular
    0.17
    pendicular
    0.14
     orientation
    0.13
     axis
    0.12
     orientations
    0.12
     Orientation
    0.12
     rotated
    0.12
     axes
    0.11
    0.11
     вертик
    0.11
    Act Density 0.043%

    No Known Activations