INDEX
Explanations
consistent phrases indicating similarity or uniformity across different contexts
New Auto-Interp
Negative Logits
ligiloj
-0.61
يتيمه
-0.54
principalTable
-0.50
oredCriteria
-0.49
'){
-0.49
ThroughAttribute
-0.48
TestBed
-0.47
ulum
-0.46
)*/
-0.46
)';
-0.46
POSITIVE LOGITS
unmodified
0.67
iguales
0.66
forall
0.64
Iden
0.64
identical
0.61
scolaires
0.61
seragam
0.60
addContainerGap
0.60
unchanged
0.60
geblieben
0.60
Activations Density 0.529%