INDEX
Explanations
phrases indicating admissions or confessions
admitted to / confessed to
New Auto-Interp
Negative Logits
because
-0.41
extranjera
-0.39
bienven
-0.39
vectorielles
-0.38
Apalagi
-0.38
précisément
-0.37
capitale
-0.37
because
-0.37
sprawy
-0.36
dépla
-0.36
POSITIVE LOGITS
AssemblyTitle
0.53
PropertyGroup
0.51
っこう
0.51
>=",
0.50
<>",
0.49
LLocation
0.49
toise
0.48
inSlope
0.47
loit
0.46
GEBURTS
0.45
Activations Density 0.012%