INDEX
Explanations
questions beginning with "why" or "how."
New Auto-Interp
Negative Logits
NUMX
-0.58
Поэтому
-0.49
sécher
-0.44
קישורים
-0.43
{}".-0.42
oof
-0.42
escrit
-0.41
{}'.-0.40
elif
-0.40
aarrggbb
-0.39
POSITIVE LOGITS
How
0.92
Why
0.89
How
0.88
What
0.84
Why
0.84
What
0.81
Who
0.80
Who
0.76
¿
0.73
Which
0.71
Activations Density 0.227%