INDEX
Explanations
specifically the word "err" or its variations
instances of the word "err" and its variations
New Auto-Interp
Negative Logits
xual
-0.76
eph
-0.69
ciating
-0.68
waves
-0.67
ffield
-0.67
esville
-0.65
etimes
-0.64
bern
-0.64
thood
-0.64
edom
-0.64
POSITIVE LOGITS
err
1.12
rr
0.87
ModLoader
0.76
anger
0.76
aditional
0.75
rors
0.74
atively
0.72
uary
0.71
Err
0.71
kson
0.70
Activations Density 0.006%