INDEX
Explanations
mentioning specific details
New Auto-Interp
Negative Logits
Question
0.44
Choose
0.43
Understanding
0.42
Abandon
0.41
Aboriginal
0.39
Answer
0.38
Choosing
0.38
stellte
0.38
answer
0.38
Determine
0.37
POSITIVE LOGITS
언급
0.76
mention
0.73
mentions
0.69
mentioning
0.69
Mention
0.67
mention
0.65
जिक्र
0.63
mencion
0.62
menciona
0.61
упомина
0.60
Activations Density 0.148%