INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    é
    0.96
     in
    0.89
    x
    0.84
    w
    0.82
    cule
    0.77
     obsessive
    0.77
    v
    0.76
     emphas
    0.75
     inclin
    0.73
     \
    0.72
    POSITIVE LOGITS
    д
    1.19
    at
    1.10
    Void
    0.97
    дят
    0.92
    కు
    0.89
    ة
    0.86
    0.84
    לים
    0.84
    ف
    0.84
    <0x80>
    0.83
    Act Density 0.005%

    No Known Activations