INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     visible
    -0.81
    Visible
    -0.72
     possible
    -0.71
    visible
    -0.70
     posibles
    -0.68
    possible
    -0.66
    ing
    -0.64
     possíveis
    -0.61
     possível
    -0.60
     möjligt
    -0.60
    POSITIVE LOGITS
     Theſe
    0.99
     Efq
    0.96
     myſelf
    0.87
     Anſ
    0.81
     Jefus
    0.81
     themſelves
    0.78
     againſt
    0.77
     Beſ
    0.77
     pleaſure
    0.77
     Monfieur
    0.77
    Act Density 0.083%

    No Known Activations