INDEX
Explanations
instances where credit is being given or deserved
New Auto-Interp
Negative Logits
nel
-0.69
atos
-0.64
chan
-0.61
icz
-0.61
contracting
-0.60
IRE
-0.59
viol
-0.56
entangled
-0.55
inj
-0.55
Alter
-0.55
POSITIVE LOGITS
ability
0.75
giving
0.75
worthiness
0.74
abilities
0.74
ibly
0.73
itism
0.71
cards
0.70
credits
0.70
ably
0.69
orable
0.69
Activations Density 0.039%