INDEX
Negative Logits
ovember
-0.83
uberty
-0.76
»Ĵ
-0.76
hello
-0.69
agues
-0.69
rients
-0.69
Airl
-0.68
Waves
-0.67
Towns
-0.67
Shots
-0.66
POSITIVE LOGITS
unwillingness
1.83
indifference
1.77
hypocrisy
1.75
disregard
1.74
arrogance
1.69
ignorance
1.65
inability
1.61
incompetence
1.60
incap
1.54
complicity
1.53
Activations Density 0.379%