INDEX
    Explanations

    the word "da" in various contexts

    New Auto-Interp
    Negative Logits
     itſelf
    -1.05
     Diſ
    -0.98
     themſelves
    -0.94
     Reſ
    -0.93
     leaſt
    -0.87
     Anſ
    -0.84
     ſeveral
    -0.82
     myſelf
    -0.80
     raiſ
    -0.80
     poffible
    -0.79
    POSITIVE LOGITS
     da
    2.21
     Da
    2.10
    Da
    2.00
    da
    1.53
     DA
    1.51
    DA
    1.37
     Dahl
    1.15
     да
    1.02
     DAZ
    0.99
    Да
    0.91
    Act Density 0.075%

    No Known Activations