INDEX
    Explanations

    generating text after "model"

    New Auto-Interp
    Negative Logits
     creeps
    0.40
     fairies
    0.36
     oysters
    0.36
     vacuoles
    0.36
     weir
    0.35
     bottles
    0.35
     slippers
    0.35
     canoes
    0.35
     veggies
    0.34
     tincture
    0.33
    POSITIVE LOGITS
    This
    0.42
    ک
    0.40
    Python
    0.39
     இந்த
    0.38
     மேம்ப
    0.38
    The
    0.38
    ChatGPT
    0.37
     기본적인
    0.37
     практи
    0.36
    改革
    0.36
    Act Density 4.633%

    No Known Activations