INDEX
    Explanations

    references to the field of science or scientific concepts

    words related to conscience or ethical considerations

    New Auto-Interp
    Negative Logits
    stage
    -0.79
    lift
    -0.68
    ting
    -0.68
    ton
    -0.67
    lain
    -0.65
    stood
    -0.63
    nikov
    -0.62
    managed
    -0.61
    TON
    -0.61
     tolerate
    -0.60
    POSITIVE LOGITS
    ences
    1.00
    zona
    0.87
    ppo
    0.86
    pe
    0.85
    ptions
    0.85
    ardo
    0.84
    ption
    0.84
    oglu
    0.84
    emi
    0.83
    otti
    0.83
    Act Density 0.023%

    No Known Activations