INDEX
Explanations
specific phrases or terms related to content updates and features on websites and applications
common phrases
the neuron responds to content-bearing or topical words (important nouns, verbs, pronouns and discourse markers) rather than function or filler tokens.
New Auto-Interp
Negative Logits
ProtoMessage
-0.71
setVerticalGroup
-0.67
ویکیپدی
-0.65
հղումներ
-0.61
oneofs
-0.59
Савезне
-0.59
tiguan
-0.58
Personendaten
-0.58
WebElementEntity
-0.58
saites
-0.58
POSITIVE LOGITS
Masyarakat
0.29
nahilalakip
0.28
eletrônico
0.27
Erscheinung
0.26
محفوظة
0.26
akik
0.26
jezelf
0.26
Year
0.25
jspb
0.25
usage
0.25
Activations Density 0.160%