INDEX
Explanations
numerical metrics or quantities in a structured format
New Auto-Interp
Negative Logits
Archdemon
-0.65
innocence
-0.59
sunset
-0.59
WARN
-0.58
romeda
-0.56
amber
-0.56
condem
-0.56
reviewer
-0.55
appraisal
-0.55
mortar
-0.55
POSITIVE LOGITS
lycer
0.85
bish
0.83
portation
0.81
idad
0.74
quist
0.73
ħĭ
0.72
tal
0.71
itte
0.69
edes
0.69
oxide
0.69
Activations Density 0.044%