INDEX
Explanations
phrases related to rules, responsibilities, adherence to standards, and consequences of actions
New Auto-Interp
Negative Logits
ifter
-0.71
Croat
-0.71
beans
-0.69
Crusader
-0.67
boa
-0.67
andel
-0.66
AIN
-0.66
Straw
-0.66
fly
-0.63
Rasmussen
-0.63
POSITIVE LOGITS
thereto
1.51
xual
1.00
entious
0.98
unto
0.94
pires
0.93
itiz
0.91
gypt
0.89
lectic
0.82
ÃĽ
0.82
imus
0.82
Activations Density 2.534%