INDEX
Explanations
The neuron is activated by sentences indicating the beginning or end of an article or section
sentences emphasizing the concept of truth or factual statements
New Auto-Interp
Negative Logits
peg
-0.61
uese
-0.60
inav
-0.59
Telesc
-0.58
iates
-0.57
Towers
-0.57
iture
-0.55
encounters
-0.54
itches
-0.53
slips
-0.53
POSITIVE LOGITS
rael
1.13
senal
0.95
nt
0.95
cussion
0.94
hereby
0.92
definitely
0.90
olate
0.88
olation
0.83
not
0.83
unlikely
0.82
Activations Density 0.576%