INDEX
Explanations
words related to restructuring or destruction
terms related to destruction and structural collapse
New Auto-Interp
Negative Logits
Leone
-0.72
>>>>>>>>
-0.64
Wilde
-0.59
pora
-0.57
background
-0.56
steep
-0.56
metab
-0.55
Alban
-0.55
Krishna
-0.54
referen
-0.54
POSITIVE LOGITS
ures
1.19
ured
1.03
ruct
1.02
ible
0.99
ibles
0.99
ural
0.97
urally
0.95
uring
0.94
urous
0.86
alore
0.86
Activations Density 0.032%