INDEX
Explanations
phrases indicating temporal relationships or connections
independently or given
New Auto-Interp
Negative Logits
featureID
-0.91
Roskov
-0.82
verwijspagina
-0.71
disambiguazione
-0.69
StructEnd
-0.63
InitVars
-0.61
فريبيس
-0.61
defStyleAttr
-0.60
AndEndTag
-0.60
autorytatywna
-0.59
POSITIVE LOGITS
equally
0.35
ENOT
0.35
igualmente
0.33
yet
0.32
tvguidetime
0.32
similarly
0.31
necesariamente
0.31
Fä
0.31
inom
0.30
necessarily
0.30
Activations Density 0.153%