INDEX
Explanations
if-then logic and explanations
New Auto-Interp
Negative Logits
dd
0.60
kt
0.58
Ovaj
0.56
k
0.56
got
0.55
img
0.55
tt
0.54
kn
0.53
Hallo
0.53
jsem
0.53
POSITIVE LOGITS
billionaires
0.58
insurers
0.56
ratepayers
0.55
athletes
0.55
corporations
0.55
pollinators
0.55
automakers
0.53
fintech
0.52
policymakers
0.52
democracies
0.52
Activations Density 0.000%