INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fiabilité
    0.41
    рафон
    0.40
     infrastrukt
    0.40
     peuple
    0.40
    ологи
    0.38
     imigr
    0.38
    اؤن
    0.38
     entidad
    0.38
     lidí
    0.38
    Teddy
    0.38
    POSITIVE LOGITS
    T
    0.46
    h
    0.45
    color
    0.44
    ver
    0.43
    R
    0.42
    j
    0.42
    p
    0.42
    g
    0.42
    c
    0.42
    P
    0.41
    Act Density 0.001%

    No Known Activations