Data Software Engineering Daily

Sinopsis

Databases and data engineering episodes of Software Engineering Daily

Episodios

  • API Change Management with Aidan Cunniffe

    API Change Management with Aidan Cunniffe

    02/09/2020 Duración: 42min

    APIs within a company change all the time. Every service owner has an API to manage, and those APIs have upstream and downstream connections. APIs need to be tested for integration points as well as for their “contract”, the agreement between an API owner and the consumers of that API. Aidan Cuniffe is the founder The post API Change Management with Aidan Cunniffe appeared first on Software Engineering Daily.

  • Data Version Control with Dmitry Petrov

    Data Version Control with Dmitry Petrov

    24/08/2020 Duración: 54min

    Code is version controlled through git, the version control system originally built to manage the Linux codebase. For decades, software has been developed using git for version control. More recently, data engineering has become an unavoidable facet of software development. It is reasonable to ask–why are we not version controlling our data? Dmitry Petrov is The post Data Version Control with Dmitry Petrov appeared first on Software Engineering Daily.

  • Ray Applications with Richard Liaw

    Ray Applications with Richard Liaw

    24/07/2020 Duración: 54min

    Ray is a general purpose distributed computing framework. At a low level, Ray provides fault-tolerant primitives that support applications running across multiple processors. At a higher level, Ray supports scalable reinforcement learning, including the common problem of hyperparameter tuning. In a previous episode, we explored the primitives of Ray as well as Anyscale, the business The post Ray Applications with Richard Liaw appeared first on Software Engineering Daily.

  • Modin: Pandas Scalability with Devin Petersohn

    Modin: Pandas Scalability with Devin Petersohn

    23/07/2020 Duración: 58min

    Pandas is a Python data analysis library, and an essential tool in data science. Pandas allows users to load large quantities of data into a data structure called a dataframe, over which the user can call mathematical operations. When the data fits entirely into memory this works well, but sometimes there is too much data The post Modin: Pandas Scalability with Devin Petersohn appeared first on Software Engineering Daily.

  • Sourcegraph: Code Search and Intelligence with Beyang Liu

    Sourcegraph: Code Search and Intelligence with Beyang Liu

    22/07/2020 Duración: 59min

    A large codebase cannot be searched with naive indexing algorithms. In order to search through a codebase the size of Uber’s it is necessary to build a much more sophisticated indexing system than simple pure text search. Sourcegraph is a system for universal code search. It allows developers to more easily onboard to a new The post Sourcegraph: Code Search and Intelligence with Beyang Liu appeared first on Software Engineering Daily.

  • ADP Engineering with Tim Halbur

    ADP Engineering with Tim Halbur

    17/07/2020 Duración: 55min

    ADP has been around for more than 70 years, fulfilling payroll and other human resources services. Payroll processing is a complex business, involving the movement of money in accordance with regulatory and legal strictures.  From an engineering point of view, ADP has decades of software behind it, and a bright future of a platform company The post ADP Engineering with Tim Halbur appeared first on Software Engineering Daily.

  • Chronosphere: Scalable Metrics Database with Rob Skillington

    Chronosphere: Scalable Metrics Database with Rob Skillington

    09/07/2020 Duración: 41min

    M3 is a scalable metrics database originally built to host Uber’s rapidly growing data storage from Prometheus. When Rob Skillington was at Uber, he helped design, implement, and deploy M3. Since leaving Uber, he has co-founded a company around a hosted version of M3 called Chronosphere. If you have access to a scalable metrics database, The post Chronosphere: Scalable Metrics Database with Rob Skillington appeared first on Software Engineering Daily.

  • DynamoDB with Alex DeBrie

    DynamoDB with Alex DeBrie

    02/07/2020 Duración: 01h01min

    DynamoDB is a managed NoSQL database service from AWS. It is widely used as a transactional database to fulfill key-value and wide-column data models. In a previous show with Rick Houlihan, we explored how to build a data model and optimize the query patterns for a NoSQL database.  Today’s show is about DynamoDB specifically: partitioning, The post DynamoDB with Alex DeBrie appeared first on Software Engineering Daily.

  • Snowplow Analytics: Data Collection Platform with Alex Dean

    Snowplow Analytics: Data Collection Platform with Alex Dean

    01/07/2020 Duración: 57min

    As a user browses a webpage, that browser session generates events that need to be recorded, validated, enriched, and stored. This data is sometimes called customer data infrastructure, or CDI. This data requires a full stack of different tools: a system on the frontend to collect the data, middleware to transport the data, and backend The post Snowplow Analytics: Data Collection Platform with Alex Dean appeared first on Software Engineering Daily.

  • Postman: API Development with Abhinav Asthana

    Postman: API Development with Abhinav Asthana

    30/06/2020 Duración: 55min

    A software company manages and interacts with hundreds of APIs. These APIs require testing, performance analysis, authorization management, and release management. In a word, APIs require collaboration. Postman is a system for API collaboration. It allows users to test APIs with collections of requests, monitor the API responses, and visualize the query results. Users of The post Postman: API Development with Abhinav Asthana appeared first on Software Engineering Daily.

  • Data Intensive Applications with Martin Kleppman (Summer Break Repeat)

    Data Intensive Applications with Martin Kleppman (Summer Break Repeat)

    23/06/2020 Duración: 01h04min

    Originally published May 2, 2017. We are taking a few weeks off. We’ll be back soon with new episodes. A new programmer learns to build applications using data structures like a queue, a cache, or a database. Modern cloud applications are built using more sophisticated tools like Redis, Kafka, or Amazon S3. These tools do The post Data Intensive Applications with Martin Kleppman (Summer Break Repeat) appeared first on Software Engineering Daily.

  • Redis with Alvin Richards (Summer Break Repeat)

    Redis with Alvin Richards (Summer Break Repeat)

    18/06/2020 Duración: 53min

    Originally published October 24, 2019. We are taking a few weeks off. We’ll be back soon with new episodes. Redis is an in-memory database that persists to disk. Redis is commonly used as an object cache for web applications. Applications are composed of caches and databases. A cache typically stores the data in memory, and The post Redis with Alvin Richards (Summer Break Repeat) appeared first on Software Engineering Daily.

  • Apache Airflow with Maxime Beauchemin, Vikram Koka, and Ash Berlin-Taylor

    Apache Airflow with Maxime Beauchemin, Vikram Koka, and Ash Berlin-Taylor

    10/06/2020 Duración: 01h04min

    Apache Airflow was released in 2015, introducing the first popular open source solution to data pipeline orchestration. Since that time, Airflow has been widely adopted for dependency-based data workflows. A developer might orchestrate a pipeline with hundreds of tasks, with dependencies between jobs in Spark, Hadoop, and Snowflake. Since Airflow’s creation, it has powered the The post Apache Airflow with Maxime Beauchemin, Vikram Koka, and Ash Berlin-Taylor appeared first on Software Engineering Daily.

  • Human in the Loop Data Analytics with Aditya Parameswaran

    Human in the Loop Data Analytics with Aditya Parameswaran

    09/06/2020 Duración: 48min

    The life cycle of data management includes data cleaning, extraction, integration, analysis and exploration, and machine learning models. It would be great if all of this data management could be handled with automation, but unfortunately that is not an option. For most applications, data management requires a human in the loop. A human in the The post Human in the Loop Data Analytics with Aditya Parameswaran appeared first on Software Engineering Daily.

  • Uber’s Data Visualization Tools with Ib Green

    Uber’s Data Visualization Tools with Ib Green

    05/06/2020 Duración: 48min

    Uber needs to visualize data on a range of different surfaces. A smartphone user sees cars moving around on a map as they wait for their ride to arrive. Data scientists and operations researchers within Uber study the renderings of traffic moving throughout a city. Data visualization is core to Uber, and the company has The post Uber’s Data Visualization Tools with Ib Green appeared first on Software Engineering Daily.

  • Prisma: Modern Database Tooling with Johannes Schickling

    Prisma: Modern Database Tooling with Johannes Schickling

    04/06/2020 Duración: 50min

    A frontend developer issuing a query to a backend server typically requires the developer to issue that query through an ORM or a raw database query. Prisma is an alternative to both of these data access patterns, allowing for easier database access through auto-generated, type-safe query building tailored to an existing database schema. By integrating The post Prisma: Modern Database Tooling with Johannes Schickling appeared first on Software Engineering Daily.

  • HoloClean: Data Quality Management with Theodoros Rekatsinas

    HoloClean: Data Quality Management with Theodoros Rekatsinas

    02/06/2020 Duración: 01h52s

    Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally, we would have completely clean data sources, but that’s not very realistic. One alternative The post HoloClean: Data Quality Management with Theodoros Rekatsinas appeared first on Software Engineering Daily.

  • Disaggregated Servers with Yiying Zhang

    Disaggregated Servers with Yiying Zhang

    01/06/2020 Duración: 57min

    Server infrastructure traditionally consists of monolithic servers containing all of the necessary hardware to run a computer. These different hardware components are located next to each other, and do not need to communicate over a network boundary to connect the CPU and memory. LegoOS is a model for disaggregated, network-attached hardware. LegoOS disseminates the traditional The post Disaggregated Servers with Yiying Zhang appeared first on Software Engineering Daily.

  • Brex Engineering with Cosmin Nicolaescu

    Brex Engineering with Cosmin Nicolaescu

    27/05/2020 Duración: 52min

    Brex is a credit card company that provides credit to startups, mostly companies which have raised money. Brex processes millions of transactions, and uses the data from those transactions to assess creditworthiness, prevent fraud, and surface insights for the users of their cards. Brex is full of interesting engineering problems. The high volume of transactions The post Brex Engineering with Cosmin Nicolaescu appeared first on Software Engineering Daily.

  • ArcGIS: Geographic Information Software with Max Payson

    ArcGIS: Geographic Information Software with Max Payson

    21/05/2020 Duración: 56min

    Geospatial analytics tools are used to render visualizations for a vast array of applications. Data sources such as satellites and cellular data can gather location data, and that data can be superimposed over a map. A map-based visualization can allow the end user to make decisions based on what they see. ArcGIS is one of The post ArcGIS: Geographic Information Software with Max Payson appeared first on Software Engineering Daily.

página 1 de 5

Informações: