INDEX
    Explanations

    therefore, introducing the answer

    New Auto-Interp
    Negative Logits
     explanations
    0.48
     yes
    0.46
     justifications
    0.45
     explanation
    0.42
     contenders
    0.41
     suggestion
    0.40
     suggestions
    0.39
     explained
    0.39
    :
    0.38
     Yes
    0.38
    POSITIVE LOGITS
     مناسب
    0.45
    最好的
    0.43
    neither
    0.42
    ნიშ
    0.41
    suitable
    0.40
     కల
    0.40
    Wt
    0.40
    どちら
    0.40
     जिसका
    0.39
    SDD
    0.39
    Act Density 0.010%

    No Known Activations