INDEX
Explanations
references to bugs in software or systems
New Auto-Interp
Negative Logits
ACP
-0.87
yss
-0.83
uclear
-0.75
ometown
-0.71
NAS
-0.71
amina
-0.70
ining
-0.69
ager
-0.67
minist
-0.66
ographic
-0.66
POSITIVE LOGITS
Bunny
1.03
hooting
0.95
bugs
0.94
patched
0.92
bugs
0.91
pots
0.89
Bugs
0.86
afety
0.84
glitches
0.82
lash
0.79
Activations Density 0.013%