INDEX
Explanations
"What is" or "What's" questions
New Auto-Interp
Negative Logits
contributed
0.49
influenced
0.48
contributes
0.47
Ы
0.45
distinguishes
0.45
elevates
0.44
constitutes
0.44
the
0.43
differentiates
0.43
complicates
0.42
POSITIVE LOGITS
Stakes
0.43
धीरे
0.42
Wählen
0.41
Germans
0.41
nouă
0.40
indef
0.39
बंधन
0.39
scegliere
0.39
escoger
0.39
別人
0.39
Activations Density 0.006%