
Data Scientist
- Ashburn, VA
- Permanent
- Full-time
- Perform hands-on analysis and modeling involving the creation of intervention hypotheses and experiments, assessment of data needs and available sources, determination of optimal analytical approaches, performance of exploratory data analysis, and feature generation (e.g., identification, derivation, aggregation).
- Demonstrate proficiency in extracting, cleaning, and transforming CBP transactional and mission data associated within an identified problem space to build predictive models as well as develop appropriate supporting documentation.
- Leverage knowledge of a variety of statistical and machine learning techniques and methods to define and develop programming algorithms; train, evaluate, and deploy predictive analytics models that directly inform mission decisions.
- Execute projects including those intended to identify patterns and/or anomalies in large datasets; perform automated text/data classification and categorization as well as entity recognition, resolution and extraction; and named entity matching.
- HS Diploma/GED and 6 or more years of experience or AS/AA and 4 or more years or BS/BA and 2 or more years or MS/MA/MBA or PhD/Doctorate
- Experience in developing machine learning models and applying advanced analytics solutions to solve complex business problems
- Experience with programming languages including: R, Python, Scala, Java.
- Experience with SQL programming
- Experience constructing and executing queries to extract data in support of EDA and model development
- Experience with unsupervised and supervised machine learning techniques and methods
- Experience performing data mining, analysis, and training set construction
- Proficiency with statistical software packages including: SAS, SPSS Modeler, R, WEKA, or equivalent
- Proficiency with Unsupervised Machine Learning methods including Cluster Analysis (e.g., K-means, K-nearest Neighbor, Hierarchical, Deep Belief Networks, Principal Component Analysis), Segmentation, etc.
- Proficiency with Supervised Machine Learning methods including Decision Trees, Support Vector Machines, Logistic Regression, Random/Rotation Forests, Categorization/Classification, Neural Nets, Bayesian Networks, etc.
- Experience with pattern recognition and extraction, automated classification, and categorization and with entity resolution (e.g., record linking, named-entity matching, deduplication/ disambiguation)
- Experience with visualization tools and techniques (e.g., Periscope, Business Objects, D3, ggplot, Tableau, SAS Visual Analytics, PowerBI). Experience with big data technologies (e.g., Hadoop, HIVE, HDFS, HBase, MapReduce, Spark, Kafka, Sqoop)
- Must be a U.S. citizen with the ability to obtain DHS Customs and Border Protection (CBP) suitability.
- The person in this position needs to occasionally move about inside the office to access file cabinets, office machinery, or to communicate with co-workers, management, and customers, which may involve delivering presentations.