INDEX
Explanations
mentions of specific named technical or domain entities (proper nouns for software, frameworks, laws, models, etc.).
The neuron activates on mentions of “League of Legends” (i.e. tie-ins to that game).
New Auto-Interp
Negative Logits
'
0.33
could
0.29
might
0.29
comes
0.29
terroir
0.29
becomes
0.28
exudes
0.28
’
0.28
was
0.28
verlieren
0.28
POSITIVE LOGITS
0.25
↵
0.24
Они
0.24
IDENCE
0.24
<unused2135>
0.23
Ото
0.22
ỹ
0.22
После
0.22
د
0.21
Пе
0.21
Activations Density 0.000%