INDEX
Explanations
terms related to morality and ethics
New Auto-Interp
Negative Logits
Silvio
-0.81
Guilherme
-0.80
Dada
-0.77
meninos
-0.77
Madeira
-0.76
Padua
-0.75
Madagascar
-0.75
valentin
-0.75
>>(
-0.75
Appleton
-0.75
POSITIVE LOGITS
Morales
1.16
Mor
1.13
Mor
1.03
MOR
0.96
mor
0.90
morales
0.85
mor
0.82
Mortimer
0.80
Moran
0.80
MOR
0.79
Activations Density 0.216%