INDEX
Explanations
repeated instances of the letter 'o' in close proximity
New Auto-Interp
Negative Logits
theless
-0.75
detrim
-0.69
glim
-0.68
Salary
-0.65
Extras
-0.61
REDACTED
-0.61
disson
-0.60
代
-0.60
deficiencies
-0.59
Expend
-0.59
POSITIVE LOGITS
opy
1.42
pper
1.23
gged
1.07
gging
1.07
opers
1.04
aks
1.04
zzi
1.04
ppo
1.04
ppy
1.04
veland
1.03
Activations Density 0.020%