INDEX
Explanations
words and terms related to specific geographic or cultural identities
poet laureate
the neuron activates on subword tokens that are the initial piece or prefix of a longer word (i.e., beginning-of-word subword stems).
New Auto-Interp
Negative Logits
Pos
-0.48
Tor
-0.48
Phan
-0.45
ком
-0.45
Luc
-0.44
cookie
-0.44
Aps
-0.43
неза
-0.43
Коло
-0.43
Pun
-0.43
POSITIVE LOGITS
DebuggerNonUser
0.63
complexContent
0.62
initComponents
0.56
parsedMessage
0.54
BeginInit
0.54
GenerationType
0.52
MLLoader
0.52
0.51
VersionUID
0.50
bootstrapcdn
0.50
Activations Density 0.634%