INDEX
Explanations
words related to breaking, destruction, or failure
terms related to destruction or breaking
New Auto-Interp
Negative Logits
ature
-0.75
izations
-0.74
tarians
-0.72
uters
-0.70
inference
-0.69
oaded
-0.68
FactoryReloaded
-0.67
pmwiki
-0.66
iked
-0.65
abet
-0.64
POSITIVE LOGITS
stal
0.97
stals
0.87
ãĤ©
0.84
shards
0.79
Shards
0.78
illusions
0.76
shattered
0.76
shatter
0.74
blue
0.73
IRE
0.72
Activations Density 0.017%