INDEX
Explanations
numerical values and percentages mentioned as a figure or concept in the text
New Auto-Interp
Negative Logits
tranquil
-0.63
avorite
-0.63
sclerosis
-0.62
repay
-0.62
haun
-0.60
merry
-0.60
regist
-0.59
Cause
-0.58
DEBUG
-0.58
discour
-0.58
POSITIVE LOGITS
.,
1.26
.:
0.89
ross
0.82
.).
0.79
eter
0.79
aminer
0.77
hey
0.77
emonic
0.77
raphics
0.76
.,"
0.75
Activations Density 0.010%