INDEX
Explanations
question marks and response indicators in dialogues
New Auto-Interp
Negative Logits
why
-0.43
apakah
-0.34
warum
-0.34
does
-0.32
did
-0.32
آیا
-0.29
whoever
-0.29
這就是
-0.28
ViewImports
-0.28
maybe
-0.28
POSITIVE LOGITS
How
1.30
What
1.29
How
1.04
What
1.01
Where
0.91
Which
0.87
Where
0.78
Who
0.77
httphttps
0.72
Which
0.71
Activations Density 0.389%