INDEX
Explanations
names with the word "error"
instances of the word "err" and its variations, indicating references to errors or mistakes
New Auto-Interp
Negative Logits
esville
-0.79
ĺħ
-0.76
creen
-0.74
figure
-0.74
ciating
-0.72
eye
-0.68
cape
-0.68
lucent
-0.66
åĮ
-0.64
stage
-0.64
POSITIVE LOGITS
rr
0.99
andom
0.96
idge
0.96
ange
0.94
untled
0.94
ands
0.85
anger
0.85
ilateral
0.82
antly
0.80
ific
0.79
Activations Density 0.022%