Posts

Showing posts from February, 2024

Legal Text Analysis Project: Leveraging NLP for Enhanced Understanding and Insight

Image
 A key point in the development of NLP solutions is the consideration of language proficiency. The terms and jargon used in specialized sectors, such as the medical sector, differ greatly from those used in the financial sector, which in turn are not comparable to those in the legal sector. The legal texts contain a series of specifications such as: The breadth of the domain in terms of textual typology. The variety of target groups. The linguistic features of the domain. Furthermore, in the latter case, not only is the legal terminology of the domain covered, but it also tends to co-occur with terminologies from all areas. The limited number of NLP resources and tools tailored to the overall domain. The predominance of English, since most of the resources and tools available are developed for processing texts in English. A slowed adoption of smart technologies in the legal and administrative sector compared to other sectors such as the biomedical or financial sector. The heterogen...

Optimizing Resource Consumption: A Predictive System Utilizing Infrastructure and App Signals

Image
 In today's dynamic digital landscape, the ability to efficiently manage resources is paramount for businesses and organizations. With the ever-growing demand for online services, predicting resource consumption accurately has become a challenging yet essential task. To address this challenge, we introduce a cutting-edge predictive system that leverages infrastructure and application signals to forecast resource needs in the upcoming hours. Understanding the Challenge Resource management, whether it pertains to cloud computing, server infrastructure, or network bandwidth, requires foresight to ensure optimal performance and cost-effectiveness. Traditional methods of resource provisioning often rely on historical data or manual adjustments, which can lead to inefficiencies and unnecessary expenditures. Moreover, sudden spikes in demand or unexpected events can further exacerbate the problem, resulting in downtime or degraded service quality. Introducing Our Solution Our predictive s...

Real-Time Intruder Detection System Using Device App Signals and Python Distance Calculation

Image
 In today's interconnected world, safeguarding digital identities and protecting user accounts from unauthorized access is paramount. To address this critical challenge, we've developed an innovative real-time intruder detection system that leverages device application signals and Python-based distance calculation algorithms. Key Components of the System: App Signal Monitoring: The system continuously monitors application signals emitted by devices associated with user accounts. These signals include user interactions, login attempts, and application usage patterns. User Behavior Profiling: By analyzing the historical app signals associated with each user account, the system builds a comprehensive profile of typical user behavior. This profile serves as a baseline for detecting deviations that may indicate unauthorized access or intruder activity. Distance Calculation Algorithm: Utilizing Python-based distance calculation algorithms, the system quantifies the similarity between...

User Experience and Device Monitoring Project

Image
  The question   Getting reliable indicators of the user experience of using applications has been a challenge since the use of IT has become widespread. On the other hand, with the growth of cloud services and the virtualization of the workplace, there are more and more factors that affect the end user experience, and therefore, it is increasingly difficult to obtain reliable indicators of whether the interaction is appropriate or not. Currently, different metrics are used, ranging from direct user surveys to obtaining indicators based on technical parameters that are assumed to positively or negatively affect the user experience. Surveys have the drawback that we depend on an action on the part of the user and are therefore subjective, while indicators based on metrics fail to cover all possible factors and therefore end up generating many false negatives (the indicators show “ green” but the user complains). With this project we propose to add an indicator based on the user...

Enhancing Citrix Session Analysis with Real-Time Monitoring System

Image
  In today's dynamic business landscape, ensuring seamless performance and optimal user experience within Citrix environments is paramount. To address this need, I've developed a sophisticated system leveraging the power of Citrix APIs, specifically the Get-BrokerSession and Get-BrokerApplication signals. This innovative system operates on a real-time basis, continuously monitoring Citrix sessions, and extracting valuable insights every minute. Its core functionality revolves around reading these signals and meticulously logging session activities, regardless of whether they are in an active or terminated state. At the heart of this solution lies Elasticsearch, a robust data indexing and search engine. By seamlessly integrating with Elasticsearch, our system ensures that every session event is efficiently recorded and stored for comprehensive analysis. This real-time monitoring system offers several key advantages: Immediate Visibility: Gain instant visibility into Citrix ses...

Health Data Interoperability: OMOP

Image
 The Observational Medical Outcomes Partnership (OMOP) was a public-private partnership established to inform the appropriate use of observational healthcare databases for studying the effects of medical products. Over the course of the 5-year project and through its community of researchers from industry, government, and academia, OMOP successfully achieved its aims to: 1) conduct methodological research to empirically evaluate the performance of various analytical methods on their ability to identify true associations and avoid false findings, 2) develop tools and capabilities for transforming, characterizing, and analyzing disparate data sources across the health care delivery spectrum, and 3) establish a shared resource so that the broader research community can collaboratively advance the science. The results of OMOP's research has been  widely published  and presented at scientific conferences, including the annual  OMOP Symposium. The OMOP Legacy continues......

CRISP-DM

Image
  Comprensión del Negocio: Comprender los objetivos del negocio y definir los criterios de éxito del proyecto. Determinar cómo el análisis de datos y el Machine Learning pueden contribuir a estos objetivos. Objetivos de negocio Evaluar la situación actual Objetivos Data Mining Plan de Proyecto Comprensión de los Datos: Recopilar y analizar los datos disponibles para identificar su calidad, relevancia y limitaciones. Realizar un análisis exploratorio de los datos para obtener una comprensión más profunda. Captura de datos Descripción de los datos Exploración de los datos Verificación y gestión de la calidad Preparación de los Datos: Preprocesar y limpiar los datos para su análisis. Esto incluye la manipulación de valores faltantes, la normalización y la transformación de los datos según sea necesario. Selección de datos Limpieza de datos Construcción del juego de datos Integración de datos Formateo de datos Modelado: Seleccionar y aplicar técnicas de modelado...

Artificial Intelligence Framework: Moriarty

Image
  In consulting and data science, it is interesting to use artificial intelligence frameworks that allow code reuse in different projects for different clients. At Everis I had the opportunity to work with the Moriarty framework, integrating NLP Python modules. Moriarty is a tool that can generate  Big Data  near real-time analytics solutions ( Streaming Analytics ). This new tool makes possible the collaboration among the data scientist and the software engineer. Through Moriarty, they join forces for the rapid generation of new software solutions. The data scientist works with algorithms and data transformations using a visual interface, while the software engineer works with the idea of services to be invoked. The underlying idea is that a user can build projects of Artificial Intelligence and Data Analytics without having to make any line of code. The main power of the tool is to reduce the ‘time to market’ in an application which embeds complex algorithms of Artifici...

Advancing Healthcare Insights with Biomedical NLP Pipeline for Clinical Records from Catalonia Hospitals

Image
 In the realm of healthcare data analytics, extracting meaningful insights from clinical records is paramount for driving medical research, improving patient care, and optimizing healthcare delivery. To address this imperative, we've developed an innovative system tailored for processing clinical records sourced from hospitals across Catalonia. This system harnesses the power of Natural Language Processing (NLP) to unlock valuable insights from unstructured clinical text data. Key Components of the System: Data Ingestion: The system aggregates clinical records from multiple hospitals in Catalonia, ensuring comprehensive coverage of patient data. Language Identification: The first step in the NLP pipeline involves identifying the language of the clinical text. This is crucial for subsequent processing steps and ensures accurate analysis, particularly in multilingual regions like Catalonia. Tokenization: The text is then tokenized, breaking it down into individual words or tokens. Th...

Real-time Patient Health Dashboard: Transforming Healthcare Through Data Visualization

Image
 I worked in a project in Fujitsu with the San Carlos Hospital to develop dashboards using HCE and PowerBi and Kibana.

Streamlining Log Analysis with Logstash, Filebeat, and Elasticsearch

Image
 In the realm of IT infrastructure management, effectively monitoring and analyzing application logs is crucial for maintaining system reliability and performance. To address this imperative, I've devised a robust system harnessing the capabilities of Logstash, Filebeat, and Elasticsearch. At its core, this system is designed to seamlessly ingest, process, and index application logs into Elasticsearch for comprehensive analysis and visualization. Let's delve into the key components and functionality of this integrated solution: Log Ingestion with Filebeat: Filebeat serves as the lightweight shipper responsible for tailing application log files and forwarding them to Logstash for processing. Its efficient design ensures minimal resource overhead while guaranteeing real-time log collection from diverse sources. Data Processing with Logstash: Logstash acts as the central processing engine, facilitating data transformation, enrichment, and filtering before indexing it into Elastics...

Real-time Sales Deviation Prediction System for Retail Sector

Image
In the dynamic landscape of the retail sector, staying ahead of sales deviations is paramount for maintaining profitability and customer satisfaction. To address this challenge, I spearheaded a groundbreaking project aimed at predicting sales deviations in real-time across multiple countries, leveraging cutting-edge technology and predictive analytics. Utilizing a combination of Couchbase and Spark, we developed robust machine learning models capable of processing vast amounts of data in real-time. Couchbase, with its high-performance NoSQL database capabilities, provided the scalability and flexibility necessary to handle the large volumes of sales data from diverse geographical regions. Meanwhile, Spark's distributed computing framework enabled efficient data processing and model training, ensuring timely insights into sales patterns. The core of our solution lies in the predictive models trained on historical sales data, which continuously analyze incoming sales data to detect d...

Lifebit-GenomicsEngland: Genomic healthcare infrastructure

Image
  Patented technology that enables researchers to run analyses on multiple, distributed datasets in-situ and avoid risky movement of highly-sensitive data GenomicsEngland: refining, scaling, and evolving our ability to enable others to deliver genomic healthcare and conduct genomic research.

RD-CONNECT: Infrastructure for Rare Disease Research

Image
  RD-CONNECT: C reate an integrated global infrastructure for  R are   D isease research Rare disease patients are more likely to receive a rapid molecular diagnosis nowadays thanks to the wide adoption of next-generation sequencing. However, many cases remain undiagnosed even after exome or genome analysis, because the methods used missed the molecular cause in a known gene, or a novel causative gene could not be identified and/or confirmed. To address these challenges, the RD-Connect Genome-Phenome Analysis Platform (GPAP) facilitates the collation, discovery, sharing, and analysis of standardized genome-phenome data within a collaborative environment. Authorized clinicians and researchers submit pseudonymised phenotypic profiles encoded using the Human Phenotype Ontology, and raw genomic data which is processed through a standardized pipeline. After an optional embargo period, the data are shared with other platform users, with the objective that similar cases in the s...

Project DIGEN-1K: System for Genetic Diagnostics and Pathogen Identification throughout NGS

Image
  The question In 2010 there was an increase in genome sequencing capacity that led to the emergence of numerous projects and the generation of a large amount of data that needed to be analysed and stored. I had the opportunity to work on several projects in the genomic field, such as at CNAG and Biomol. The analysis: To this end, I worked on the development of both analysis pipelines and storage architectures using Big Data technologies.  DIGEN-1K project: Experimental development of a Genetic Diagnostics Method using Next Generation Sequencing

ADRSpanishTool: A tool for extracting adverse drug reactions and indications with NLP

Image
  The question  In the field of research, at the UPM and UC3M, I studied the relationship between drugs and adverse effects in texts. Article link The analysis We propose an hybrid method to DDi detection that uses a machine learning approach based on support vector machines and a linguistic approach that combines a simplification method similar to that of Segura-Bedmar et al. (2011), a negation method similar to that of  Chowdhury, MFM. and Lavelli, A., (2013)  and rules based on the dependency tree provided by the Textalytics eHealth PoS.  To obtain the relationships between drugs and their effects, we developed several web crawlers in order to gather sections describing drug indications and adverse drug reactions from drug package leaflets contained in the following websites: MedLinePlus, Prospectos.Net12 and Prospectos.org13. Once these sections were downloaded, their texts were processed using the Text-Alyticis tool to recognize drugs and their effects....

Metagenomic server: COVER

Image
  The question  At the CNB I was conducting several researchs in metagenomics and bacterial metabolic modelling.  In any metagenomic project, the coverage obtained for each particular species depends on its abundance. This makes it difficult to determine a priori the amount of DNA sequencing necessary to obtain a  high coverage for the dominant genomes in an environment.  To aid the design of metagenomic sequencing  projects, we have developed COVER, a web-based  tool that allows the estimation of the coverage  achieved for each species in an environmental  sample. COVER uses a set of 16S rRNA sequences to  produce an estimate of the number of operational  taxonomic units (OTUs) in the sample, provides a  taxonomic assignment for them, estimates their  genome sizes and, most critically, corrects for the  number of unobserved OTUs. COVER then calculates  the amount of sequencing needed to achieve a given  goa...