INDEX
Explanations
references to specific numerical rankings or positions
New Auto-Interp
Negative Logits
destro
-0.69
totality
-0.69
muted
-0.68
endish
-0.63
conj
-0.61
lyn
-0.61
composing
-0.61
awoken
-0.60
cruc
-0.59
suppressed
-0.58
POSITIVE LOGITS
xious
1.02
zzle
0.97
obs
0.87
isy
0.83
emi
0.83
AH
0.83
onday
0.82
Such
0.79
ises
0.78
DIV
0.78
Activations Density 0.016%