INDEX
Explanations
phrases indicating certainty or strong predictions about future actions or events
New Auto-Interp
Negative Logits
ViewFeatures
-0.78
AssemblyCulture
-0.69
Verdi
-0.64
entyfik
-0.64
Ratna
-0.64
Cowell
-0.62
serine
-0.60
Micron
-0.60
ModelState
-0.59
CORBA
-0.59
POSITIVE LOGITS
only
1.22
Only
1.01
лишь
1.00
Only
0.95
ONLY
0.95
ONLY
0.92
only
0.88
Sólo
0.85
Chỉ
0.83
onely
0.81
Activations Density 0.121%