Data Science Bootcamp programında eğitim boyunca öğrencilerimizin tamamladığı projeler
In this post, i am going to explain my 4th project at Istanbul Data Science Academy that was about NLP Classification and Sentiment Analysis. Before writing my post, i would like to share my Github repo, if you interest in, you can find Jupyter Notebook codes. In this project, i am going to analyse customer reviews about Bacchanal Buffet in Las Vegas. Bacchanal Buffet is an open kitchen, placed in Caesars Palace. As a data source, i use Yelp , Foursquare and Twitter.
In this article, we will do sentiment classification of London top 300 restaurant reviews and develop recommendation system via restaurant description,summary and user comments using Cosine Similarity. Finally, we will be deploy on Flask. First,let’s define the problem. Let’s first look at the general situation with sentiment analysis for the top 300 restaurants. What are people’s favorite features about these restaurants? or what they complain about the most?
Nowadays on the Internet there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This way, the machine learning model for automated news classification could be used to identify topics of untracked news and/or make individual suggestions based on the user’s prior interests. Thus, our aim is to build models that take as input news headline and short description and output news category.
In this project, I am going to explain Machine Learning Classification Algorithms and applying these algorithms to instacart dataset. Before writing my post, i would like to share my Github repo, if you interest in, you can find Jupyter Notebook codes. Classification is the process of predicting the class with your input data. There are mostly two types of classification in Machine Learning Algorithms, one is supervised learning other is unsupervised learning. In briefly, if your dataset includes your target (label) features and you are going to predict these target, this means supervised learning. If your dataset doesn’t include target (label) feature, this means unsupervised learning. In this post, I am going to apply supervised learning because I try to predict a purchase that customer will reorder.
Flight delays have become an important subject and problem for air transportation systems all over the world. The aviation industry is continuing to suffer from economic losses associated with flight delays all the time. According to data from the Bureau of Transportation Statistics (BTS) of the United States, more than 20% of U.S. flights were delayed in 2018. These flight delays have a severe economic impact in the U.S. that is equivalent to 40.7 billion dollars per year. Passengers suffer a loss of time, missed business opportunities or leisure activities, and airlines attempting to make up for delays leads to extra fuel consumption and a larger adverse environmental impact. In order to alleviate the negative economic and environmental impacts caused by unexpected flight delays, and balance increasing flight demand with growing flight delays, an accurate prediction of flight delays in airports is needed.
Reservation is to arrange your place before going there. The guests indicate the room number, type and the time he/she came in advance and hotels try to adjust their preparations and the needs to be provided for the guests accordingly. It is also critical for hotels to be informed about reservation in advance in terms of hotel expenses. Problem Statement Will the guest going to cancel hotel reservation? Approach 1.) Dataset 2.) Classification 3.)İmbalance Data
Bu yazımda sizlere spotify’ in spotipy kütüphanesini kullanarak, 2010–2020 yılları arasında ki 21000 parça için popülerlik tahmini yaptığım çalışmamdan bahsedeceğim. Hi everyone, in this article i will explain my second project in Istanbul Data Science Academy. As you can see of the title, i work on Inflation and interest data. Data available on https://www.tcmb.gov.tr/. Just you need merge ! Banks are affected by inflation rate while regulating interest rates. In this context, it is possible to estimate the inflation rate by using bank interest rates.
Hi everyone, in this article i will explain my second project in Istanbul Data Science Academy. As you can see of the title, i work on Inflation and interest data. Data available on https://www.tcmb.gov.tr/. Just you need merge ! Banks are affected by inflation rate while regulating interest rates. In this context, it is possible to estimate the inflation rate by using bank interest rates.
Estimating the revenue of an online user when entering the website provides huge revenue to the site.In this story, we will start building a revenue predictor using Machine Learning techniques. The dataset can be found on UCİ-Online Shoppers Intention.
In this story ,I will try to describe how I built a random forest and a decision tree model for my project. If you don’t read Part-1 you can access easily click on this Part1. — Random Forest — Random Forest creates multiple decision trees and combines them to get a more accurate and stable forecast.It instead of searching for the most important feature, searches for the best feature among a subset of random features. *Features of Random Forests It runs efficiently on large data bases. It can handle thousands of input variables without variable deletion. It has methods for balancing error in class population imbalanced data sets. In Part-1, we did some analysis now we need to change our data distribution.
In this post, i am going to try to answer that Where should I buy new property for investment in London? This is my first project at Istanbul Data Science Academy about Exploratory Data Analysis (EDA). Exploratory Data Analysis is a crucial step for all data science projects. In real world problems, data never ever used directly to your data science projects. Collecting datasets, cleaning, organising and wrangling data is critical process to develop your model. In this project, I spent nearly %80 of my time about collecting, cleaning, organising and wrangling data at my project.
NLP is the science of extracting meaning and learning from text data, and It’s one of the most used algorithms in data science projects. Text data is in everywhere, in the conclusion of that, NLP has many application areas. In this context, I decided to make an NLP project that covers the arxiv data. By going into this project, I aimed to classify tags of the articles and building a recommendation system via the article’s summary, title, author, and genre features.
Hello, I mentioned the course I took in the first writing. In the first week of the course, we saw pandas, matplotlib, seaborn lessons.(I will share my notes about each one.)Then we were asked to do a project within 1 week. This project is an EDA (Exploratory Data Analysis) .The project was given to practice what we learned in the first week.
I completed Bootcamp’s 1st week at Istanbul Data Science Academy. My first Project is EDA(Exploratory Data Analysis) and I chose a data set which name is The Earthquakes in Turkey between 1910–2017. In the recent years, we saw a lot of earthquakes in Turkey. Sometimes that was lower magnitude but sometimes bigger than usual. Because of that ı prefer to choose this data set primarily. Firstly let me explain yours earthquake and earthquake types.
Hi, i worked on n-Corona Virus 2019 dataset. In Istanbul Data Science Academy we worked on EDA (Explore data analyses) and after , every student made a project about that topic. I made this project only after the 1 week study. My topic was n-Corona virus analysis. As i talk about dataset, %27 data null (NaN) on country/Reigon column. I chanced , Nan values to “Unknown”
Veri bilimci olmak adına çıktığım yeni macerada sizlere eğitim süremiz boyunca yapacağımız 5 projeden ve öğrendiklerimden bahsederek bu güzel macerayı paylaşmayı hedeflemekteyim. Bugün sizlere ilk projem olan AirBnb datası kullanarak New York’ta Nerede kalınır sorununa cevaben hazırladığım EDA projesinden bahsedeceğim.
Istanbul Data Science Academy’de ilk haftanın ardından yaptığım Exploratory Data Analysis (EDA) projemi dün sundum. Bu yazıda size gerçekleştirdiğim projenin arkasındaki hikayeyi anlatacağım.
Veri Bilimi üzerine gerçekleştirdiğim, gerçek anlamda, bu ilk projemden sizlere bahsetmek isterim. Bu projeyi IDE ortamı olarak Anaconda -Jupyter platformunda gerçekleştirdim. Veri setimi, kaggle’dan e-commerce (e-ticaret) csv dosyası şeklinde aldım.
This article is prepared to present the pieces of information about the data analysis that I did on youtube statistics. Firstly, you should know that English is not my main language so I’m waiting for your comments about my writing flaws. So let’s start.
Önceki paylaşımlarımın birinde veri analizi,görselleştirme ve temizlenmesi konularına değindim, şimdiki blog paylaşımımda ise biraz ileriye giderek EDA (Explotary Data Analysis) ve ML (tr. Makine Öğrenimi) model kurulumuna başvuracağım. Bu aşamalara gelmeden önce kullanılacak olan veriyi ve çözüm bulmayı hedeflediğimiz soruna bakalım.
Veri dünyasına giriş yapanların öğrenmesi gereken konuların arasında yer almaktadır veri işleme ve görselleştirme adımları. Bu aşamada yapılması gereken detaylı adımlar bulunmaktadır, fakat bu paylaşımımda genel bilgileri ileterek sizleri bilgilendireceğim.
This is a Regression project that is my second project at Istanbul Data Science Academy. Cryptocurrencies are becoming so popular nowadays. So, what is cryptocurrency? Cryptocurrency is a digital or a virtual currency that is secured by cryptography which is a method of protecting information and communications that allow only the intended recipient to view its contents.
Data science akademi bootcamp eğitimizin birinci ayında ki ikinci projemizde linear regrasyon üzerine bir proje oluşturudum. Aslına bakarsanız bu çalışmamı daha önce sizlerle paylaşacaktım lakin web scraping ile veri alma denemelerinde çok fazla zaman kaybettiğimden( Genelde siteler veri alırken sizi kestiklerinden hedeflediğim siteleriden veri alamadım) projeyi hazırlayıp derlemek uzun sürdü. Vakit kaybetmeden projemi yapmak için Kaggle de yer alan diamonds verisini ele aldım.
Bu veriler UCI’ den alınmıştır. 2009–2011 yılları arasındaki Borsa İstanbul ve 7 Uluslararası kapanış endekslerini içerir. Bu projedeki amacım Borsa İstanbul bu 7 farklı endeksten nasıl etkilendiğini bulmaktı. Bootcamp’da öğrenemeye başladığımız makine öğrenmesi algoritmalarını yavaştan denemeye başlamak ve bunlardan nasıl sonuçlar elde ettiğimizi görmek temel amacımdı. Linear Regression, OLS, Ridge ve Cross Validation model için esas aldığım yöntemlerdi.
As Istanbul Data Science Academy students, we were tasked to build a linear regression model to do our second individual project. As an engineering student, I wanted to use this algorithm to solve a problem in the energy field, so I picked up a dataset from the U.S. Geological Survey to predict wind turbines capacity. Well, as far we know our world has limited sources, therefore significant transformations are happening in this decade. We all know about Tesla, solar panels and, wind turbines but still, these renewable energy systems aren’t that efficient.
This is a classification project which is my third project at Istanbul Data Science Academy. Health care in the United States can indeed be very expensive. The main purpose of health insurance is to reduce such costs to more reasonable, affordable amounts. I wonder which type of insurance is preferred according to the tobacco usage; dental or medical. I transferred data from here and used 3 datasets called Benefits and Cost Sharing, Rate, and Plan Attributes.
Gerçek dünya sorunlarına çözüm üretmek için odak noktamız havayolu sektörü oldu. Çalışmamız amacı, havayolu şirketlerine ek getiri sağlayan `overbook` uygulamasının hangi uçuşlarda kullanılabileceğinin yüksek olasılıkla tahmin edilebilmesidir. Havayolu firmaları satışa sunduğu seferin biletlerine ek olarak, yolcuların uçağı kaçırma, geç gelme ya da zamanında yapmadığı check-in işlemlerinden dolayı uçağın kapasitesinin üstünde yaptığı bilet satışı gerçekleştirir, bu uygulamaya `overbook` denir. Yolcu, bir sonraki uçuş ile seyahat eder ve yolcu memnuniyeti sağlanması adına, ek bilet yada maddi olarak ödüllendirilir.
I searched for data on a topic that interests me and found the happines world score report data from the kaggle. This report is an important survey of the state of global happiness and look at the state of happiness in the world today and observe how the new science of happiness explains personal and national variations in happiness. My main questions can be listed as follows: “Why is there such a big difference in happiness scores between countries? Why are some people happier than others? What features affect this situation?”. I applied Exploratory Data Analysis (EDA) on the data to answer these questions.
COVID-19 was confirmed to have reached Turkey on 11 March 2020, then the disease had spread all over the country. As of 14 July 2020, the total number of confirmed cases in the country is over 214,993. On the other hand, stock prices reflect expectations of future profits, and investors see the virus dampening economic activity and reducing profits.
Bu yazımda 1. projem olan EDA projesinde kullandığım NYC ait AirBnb verisi üzerinden hazırladığım Tavsiye sisteminden bahsedeceğim. Çalışmada kullandığım veri setini AirBNB nin kendi sayfasından aldım. Veri setim 2020 yılına ait konuk evleri, mahalleler, yorumlar ve konuk evi bazında ücrekerin buşunduğu 4 farklı veri setinden oluşmaktaydı. konukevi veri seti 50246 kayıt ve 106 kolondan oluştuğudan, veri setini küçültmek adına, en fazla konuk evinin olduğu 3 mahalleyi seçip, kolon olarak da EDA ve öneri sisteminden kullanacağım kolonları bırarak küçük veri seti oluşturdum.
Web scraping, web sitesi üzerinde otomatik olarak veri toplama işlemidir. Web sitelerindeki veriler metin tabanlı biçimlendirme dilleri (HTML, XML vs.) ile tutulan yapılandırılmamış verilerdir. Web scraping işlemi ile bu verileri kazıyarak yapılandırılmış bir şekilde saklamamıza imkan verir. Bu işlemi gerçekleştirmek için Python’da bazı kütüphanelere ihtiyacımız var. Bunlardan biri “BeautifulSoup” diğeri ise “requests” kütüphanesidir. “Requests” ile web sitesine veri kazıma isteğini göndeririz “BeutifulSoup” ile HTML formatını ayıklayıp bilgileri alabiliriz. Bu projede, araç satışı yapan bir sitenin sayfasından alınan 2. el otomobil ilanlarının verileri ile ikinci el araç fiyatı tahminlemesi yaptım. Web sitesine gerekli komutları gönderip her ilan sayfasının içerisine giderek araçların belirli özelliklerini topladım.
If you want to predict something such as price, sales etc, regression could be a good solution for you. In this post, I am going to apply one of the machine learning algorithms that predict football player’s market value. Every data science project starts with a problem / question. In my project, my business need scenario is: Predicting football players market value and determining who are overvalued or undervalued.
Her veri bilimi projesinin amacı sorunlara çözüm bulmaktır.Bu veri bilimi projesinde farz edelim ki Sariyer’de kiralık ev beğeneceğiz ve kafamızda düşündüğümüz,uygun kriterlerde olan bir evin kirasını tahmin etmek istiyoruz.Peki bunun için veri bilimine başvurmaya ne dersiniz? Haydi başlayalım.Öncelikle yol haritamızı bir belirleyelim. 1. Emlak sitelerinden Web Scraping ile ilan verisi çekmek. 2. Veriyi işlenebilir bir formata getirmek 3. Veriye yeni özellikler eklemek 4. Regresyon modelleri kurup en iyisini seçmek. BuradaLineer Regresyon,Ridge,Lasso ve OLS model sonuçları karşılaştırıldı.
My ultimate goal is to find a good website that provides data suits Linear Regression, scrap it, clean it, do some visualization, and find the right algorithm to predict the dependent variable. After some researchers and looking over the internet I did it🥳. I finally have found a good site that matches my goals and intentions for my project. Hürriyet Emlak which is a well known real estate website in Turkiye that provides lots of reliable information about the ads listed on its pages, this is a very important thing for us in case we need our project to work fine ‘Reliability’. I have chosen the price of the properties listed for sale as my dependent variable and started working on the data about properties located in Istanbul city. I will be using Python as my 2 go programming language, with packages to scrape data Selenium and BeautifulSoup, to do some data wrangling Pandas and Numpy, to visualize the data Seaborn and Matplotlip, and finally, for the linear regression part, I will be using Sklearn and Statsmodels.
The goal of this project is to develop a computational model for predicting the revenues based on public data for movies extracted from Boxofficemojo.com online movie database. The first phase is web scraping. Different types of features are extracted from Boxofficemojo.com which will be described later. Second phase is data cleaning. After scrapping data from our source, we cleaned our data mainly depend on unavailability of some features. After cleaning all data, next phase is exploratory data analysis. In third phase we create graphics to understand data. Fourth phase is feature engineering, where you create features for machine learning model from raw text data. Fifth phase is model analysis, where I applied one of the machine learning algorithms on our data set.
Proje 2 ana aşamadan oluşuyor. 1. aşama da kullanacağımız veriyi elde etmek için Web Scraping tekniğini kullandım. Web scraping, Vikipedi’ ya tanımına göre web sitelerinden bilgi çıkartmanın bilgisayar programı tekniğidir. Yani kişinin bir web sitesinden istediği veriyi toplaması ve işlemesi olarak özetleyebiliriz. Tabi ki her siteden web scraping yapılamayabilir çünkü bazı siteler bu konuda size izin vermeyebiliyor. 2. aşama ise çektiğimiz veri üzerinde data preprocessing, feature engineering ve regresyon modeli geliştirmektir. Projemi 2. el araba fiyat tahminlemesi üzerine yaptım. Öncelikle datayı inceleyerek işe başladım.
It has been 3 awesome weeks since Istanbul Data Science Bootcamp has started and finally, the first projects’ time has arrived. We have been asked to find a dataset that suits our goal and try to implement Exploratory Data Analysis(EDA) on it to extract instincts that help describe the business intended to focus on. This is a Brazilian e-commerce public dataset of orders made at the Olist Store. The dataset has information about 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allow viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates. This is real commercial data, it has been anonymized, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.
Classification is one of the main kinds of projects you can face in the world of Data Science. Therefore, we have been tasked to execute a classification project by our instructor, for the third individual projects. Since I began to work data science, I always wanted to work with astronomical data. So, after doing some research, I settled on working with SDSS data, which includes information about sky objects and their various features. By going into this project, I aimed to classify sky objects such as stars, galaxies, and quasars via their spectroscopic and photometric features. SDSS holds these features in two different tables, so I had to merge them with the help of a SQL command. After the merging, I added this table to my project folder. Well, I got my data so it’s time to work on the project’s methodology.
In data science projects, it is the first thing a data scientist will do, to recognize data, to examine its statistics and to examine the relationships between variables.We generally call it exploratory data analysis. In this analysis, I focused on the question of what kind of paid application a developer should develop using App Store data from Kaggle. Now it is not surprising that mobile applications are so common in our lives when we all have smartphones.Developing a mobile application is one of the most profitable and at the same time easy to make money. There are many Freelance IOS developers that contribute to this market. Let’s say you are a freelance software developer who wants to contribute to this market.So, in which category it may be more profitable to make an application, how can we determine the size and price in the category we choose so that users turn to this application? Let’s look for the answers to these questions together.