INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
↵↵
-0.43
Rasmussen
-0.40
XCTAssert
-0.40
-
-0.40
Figue
-0.40
<eos>
-0.39
,
-0.38
Gebir
-0.38
Cardoso
-0.38
Karlsson
-0.37
POSITIVE LOGITS
:✨
0.92
typed
0.84
typed
0.73
parsedMessage
0.72
Typed
0.71
Typing
0.71
Losses
0.69
ſind
0.69
typing
0.69
<unused43>
0.68
Activations Density 0.235%