INDEX
Explanations
instances of parentheses
New Auto-Interp
Negative Logits
Dol
-0.75
Goy
-0.74
lat
-0.68
shl
-0.66
col
-0.63
Thy
-0.63
fulness
-0.62
Ad
-0.62
émon
-0.62
Hoy
-0.62
POSITIVE LOGITS
])
1.64
}))
1.62
})
1.50
>)
1.50
)])
1.49
]")]
1.47
]))
1.46
))
1.45
}))
1.45
'])
1.45
Activations Density 0.838%