INDEX
Explanations
statements of evaluation or judgment
statements of opinion or assertions
New Auto-Interp
Negative Logits
enne
-0.82
Remastered
-0.78
arella
-0.76
ãĤ£
-0.75
ventures
-0.73
ipl
-0.71
Emer
-0.71
annabin
-0.71
»Ĵ
-0.70
Horizons
-0.69
POSITIVE LOGITS
counterproductive
1.56
disrespectful
1.55
unethical
1.47
immoral
1.42
irresponsible
1.41
unacceptable
1.39
hypocritical
1.35
insulting
1.31
unfair
1.31
wasteful
1.30
Activations Density 0.242%