INDEX
    Explanations

    category labels or classifications within the text

    New Auto-Interp
    Negative Logits
    šť
    -0.16
    гÑĥ
    -0.16
    iki
    -0.15
    aida
    -0.14
    indi
    -0.14
    _GU
    -0.14
    lingen
    -0.14
    ittings
    -0.14
    bow
    -0.14
     Winds
    -0.13
    POSITIVE LOGITS
    é¾
    0.17
     Clarkson
    0.15
     fat
    0.15
    iton
    0.15
    ën
    0.14
     Ta
    0.14
    thood
    0.14
     Per
    0.14
    tu
    0.14
     FAT
    0.14
    Act Density 0.005%

    No Known Activations