INDEX
Explanations
assertive statements about decisions or actions
New Auto-Interp
Negative Logits
).
-0.68
)*/
-0.66
[]){-0.65
}>;
-0.64
*/;
-0.63
))){-0.63
%";
-0.62
});
-0.61
"));
-0.60
)
-0.58
POSITIVE LOGITS
,”
0.80
,"
0.78
",
0.72
”,
0.70
,''
0.69
,’’
0.65
<>",
0.61
,”
0.61
Билгалдахарш
0.60
<<<<<<<<<<<<<<
0.59
Activations Density 0.483%