INDEX
Explanations
words related to positive attributes or contributions
New Auto-Interp
Negative Logits
dar
-0.78
Lumpur
-0.71
mares
-0.70
creen
-0.67
corn
-0.65
Niet
-0.64
stall
-0.63
deal
-0.63
Availability
-0.63
Bran
-0.61
POSITIVE LOGITS
thereto
1.04
materially
0.86
towards
0.85
positively
0.84
itives
0.82
generously
0.81
toward
0.80
contributions
0.79
greatly
0.77
immensely
0.77
Activations Density 0.035%