INDEX
Explanations
the neuron detects tokens belonging to non-Latin / foreign-language text (e.g., Cyrillic or other non-English script segments).
New Auto-Interp
Negative Logits
ases
0.35
degrade
0.33
airbags
0.31
reduzir
0.31
grinder
0.31
magnets
0.30
atualizar
0.30
market
0.29
डेट
0.29
ικο
0.29
POSITIVE LOGITS
Furthermore
0.29
Moreover
0.29
Scienze
0.28
↵
0.28
Finally
0.27
Finally
0.27
Paragraph
0.26
សូម
0.26
Finalmente
0.26
Merriam
0.26
Activations Density 0.346%