INDEX
    Explanations

    references to spatial relationships or positioning

    New Auto-Interp
    Negative Logits
     themſelves
    -0.88
     Theſe
    -0.87
     itſelf
    -0.83
     himſelf
    -0.76
     Anſ
    -0.75
     Majefty
    -0.72
     Chriftian
    -0.70
     ་་
    -0.69
     Efq
    -0.68
     kwanza
    -0.67
    POSITIVE LOGITS
     с
    0.85
     по
    0.80
    С
    0.71
     С
    0.69
     со
    0.64
     sweet
    0.62
     za
    0.62
     s
    0.61
    sweet
    0.61
     con
    0.59
    Act Density 0.015%

    No Known Activations