INDEX
    Explanations

    familial relationships and dynamics

    New Auto-Interp
    Negative Logits
     beſte
    -0.80
     eſſ
    -0.76
     pleaſure
    -0.75
     faſt
    -0.73
     ſta
    -0.73
     queſta
    -0.73
     ſtate
    -0.73
     ſua
    -0.72
    ſelf
    -0.71
     purpoſe
    -0.70
    POSITIVE LOGITS
     spoiled
    0.69
     spoilt
    0.68
     spoiling
    0.56
     adored
    0.50
     pam
    0.48
     spoil
    0.44
    0.42
     born
    0.40
     sibling
    0.39
     cherished
    0.39
    Act Density 0.041%

    No Known Activations