INDEX
    Explanations

    terms related to property damage or loss

    New Auto-Interp
    Negative Logits
     Ye
    -0.17
     YE
    -0.16
    lox
    -0.15
     Yard
    -0.15
    fos
    -0.15
    äter
    -0.15
    änn
    -0.15
    ÄIJT
    -0.14
    igte
    -0.14
    šku
    -0.14
    POSITIVE LOGITS
    y
    1.28
    yb
    0.51
    yw
    0.50
    ythe
    0.46
    yh
    0.43
    yth
    0.41
    yk
    0.40
    yro
    0.38
    yg
    0.38
    yi
    0.38
    Act Density 0.043%

    No Known Activations