INDEX
Explanations
intellectual vs. experiential understanding
New Auto-Interp
Negative Logits
ens
1.41
can
1.31
ре
1.24
ure
1.16
cou
1.10
ap
1.09
ac
1.08
ay
1.08
au
1.08
oc
1.07
POSITIVE LOGITS
shameful
0.91
spectacles
0.86
público
0.85
);
0.84
AZIONE
0.84
platform
0.82
merge
0.82
getDate
0.82
arguments
0.82
blend
0.81
Activations Density 0.005%