INDEX
Explanations
the presence of the word "ten" or variations thereof
New Auto-Interp
Negative Logits
ویکی
-0.56
subsubsection
-0.54
ScopeManager
-0.53
eens
-0.50
Knapp
-0.49
rande
-0.49
ValueStyle
-0.49
مُعرِّف
-0.49
mical
-0.48
grat
-0.47
POSITIVE LOGITS
invokingState
1.04
AndEndTag
0.92
Commandments
0.89
commandments
0.77
featureID
0.76
Downing
0.74
ंदीखरीदारी
0.72
Tors
0.71
kaynağından
0.70
0
0.69
Activations Density 0.305%