INDEX
Explanations
introduces explanations or contrasts
New Auto-Interp
Negative Logits
seine
0.42
affiche
0.40
[])
0.39
帏
0.39
commentaire
0.39
affiche
0.38
copyspace
0.38
них
0.38
든지
0.38
anderem
0.38
POSITIVE LOGITS
While
0.91
Despite
0.85
There
0.80
Often
0.78
Even
0.77
Because
0.76
Although
0.76
Many
0.76
Perhaps
0.73
Though
0.73
Activations Density 3.078%