INDEX
Explanations
references to specific numerical values or identifiers
New Auto-Interp
Negative Logits
Efq
-1.20
Monfieur
-1.02
whoſe
-1.02
Jefus
-0.98
Eſ
-0.95
Theſe
-0.91
Houſe
-0.85
raiſ
-0.85
greateſt
-0.85
―――――
-0.84
POSITIVE LOGITS
I
0.58
i
0.56
<eos>
0.55
my
0.54
↵↵
0.49
Mo
0.46
0.46
↵
0.46
</h4>
0.45
</i>
0.44
Activations Density 0.018%