publications | Meiliu Wu

2025

Cities

Do You Know Your Neighborhood? Integrating Street View Images and Multi-task Learning for Fine-Grained Multi-Class Neighborhood Wealthiness Perception Prediction

Yang Qiu, Meiliu Wu, Qunying Huang, and 1 more author

Cities, 2025

Abs HTML PDF

The assessment of urban wealthiness is fundamental to effective urban planning and development. However, conventional methodologies often rely on aggregated datasets, such as census data, with a coarse-grained resolution at the census tract level, impeding accurate evaluation of wealthiness in individual neighborhoods and failing to capture spatial heterogeneity. This study proposes a novel approach to predict urban wealthiness at a point-scale spatial resolution by utilizing geo-tagged street view images as input for deep learning models, thereby simulating human perception of urban built environments. Using the Place Pulse 2.0 dataset, which contains over 1.2 million pairwise comparisons of 110,988 street view images from 56 cities worldwide for different urban environment perception factors (e.g., safety and wealthiness), we developed deep learning models based on the Swin Transformer and Multi-gate Mixture-of-Experts (MMOE), a multi-task learning architecture. These models extract and integrate visual features of surrounding elements, including buildings, parks, and vehicles, to classify the wealthiness of specific geo-locations into three categories: Impoverished, Middle, and Affluent. To enhance model training and ground truth data, we modified and enhanced the TrueSkill Rating System, used for scoring neighborhoods via pairwise street view image comparisons, by considering temporal decay and spatial autocorrelation factors. These modifications improved the normality of wealthiness score distribution, reducing the standard deviation from 5.385 to 4.302 and skewness from −0.055 to −0.024. Consequently, model performance improved consistently, with accuracy increases observed in Swin Transformer (63 % to 68 %), ViT (54 % to 58 %), and ResNet50 (51 % to 56 %). In addition, proposed MMOE model demonstrates a significant improvement in the differentiation and classification of wealth categories within a three-class classification system (Impoverished, Middle, Affluent). It achieves an overall accuracy of 82 %, outperforming baseline models, Swin Transformer, ViT, and ResNet50, by 14 %, 24 %, and 26 % respectively. Additionally, we compared our model’s predictions with average household income data at the census block group level to elucidate its strengths and limitations. Experimental results demonstrated the efficacy of using geo-tagged street view images for predicting urban wealthiness across diverse geographic and environmental contexts. Our findings also highlight the importance of integrating both quantitative and qualitative evaluations in the prediction of urban environmental factors. By synthesizing human perceptions with advanced deep learning techniques, our approach offers a nuanced understanding of urban wealthiness, providing valuable insights for urban planning and development strategies.
npj urban sustain

Urban visual-spatial intelligence: linking human and sensor perception for sustainable urban development

Qihao Weng, Qianbao Hou, Zhixing Chen, and 8 more authors

npj Urban Sustainability, 2025

Abs HTML PDF

Understanding urban perception enhances urban intelligence research, aiding sustainable development and smart city. Urban Visual-Spatial Intelligence (UVSI) integrates human and sensor perception. This Perspective explores UVSI’s potential for sustainable development, identifies research gaps, and outlines future priorities. Advances in AI, high-performance computing, real-time data processing, and citizen science could significantly impact UVSI. Key research areas include spatiotemporal data integration, visual analytics, ethical frameworks, and inclusive methodologies for AI-driven urban management.
Remote Sensing

Advancing Self-Supervised Learning for Building Change Detection and Damage Assessment: Unified Denoising Autoencoder and Contrastive Learning Framework

Songxi Yang, Bo Peng, Tang Sui, and 2 more authors

Remote Sensing, 2025

Abs HTML PDF

Building change detection and building damage assessment are two essential tasks in post-disaster analysis. Building change detection focuses on identifying changed building areas between bi-temporal images, while building damage assessment involves segmenting all buildings and classifying their damage severity. These tasks play a critical role in disaster response and urban development monitoring. Although supervised learning has significantly advanced building change detection and damage assessment, its reliance on large labeled datasets remains a major limitation. In contrast, self-supervised learning enables the extraction of meaningful data representations without explicit training labels. To address this challenge, we propose a self-supervised learning approach that unifies denoising autoencoders and contrastive learning, enabling effective data representation for building change detection and damage assessment. The proposed architecture integrates a dual denoising autoencoder with a Vision Transformer backbone and contrastive learning strategy, complemented by a Feature Pyramid Network-ResNet dual decoder and an Edge Guidance Module. This design enhances multi-scale feature extraction and enables edge-aware segmentation for accurate predictions. Extensive experiments were conducted on five public datasets, including xBD, LEVIR, LEVIR+, SYSU, and WHU, to evaluate the performance and generalization capabilities of the model. The results demonstrate that the proposed Denoising AutoEncoder-enhanced Dual-Fusion Network (DAEDFN) approach achieves competitive performance compared with fully supervised methods. On the xBD dataset, the largest dataset for building damage assessment, our proposed method achieves an F1 score of 0.892 for building segmentation, outperforming state-of-the-art methods. For building damage severity classification, the model achieves an F1 score of 0.632. On the building change detection datasets, the proposed method achieves F1 scores of 0.837 (LEVIR), 0.817 (LEVIR+), 0.768 (SYSU), and 0.876 (WHU), demonstrating model generalization across diverse scenarios. Despite these promising results, challenges remain in complex urban environments, small-scale changes, and fine-grained boundary detection. These findings highlight the potential of self-supervised learning in building change detection and damage assessment tasks.
Urban Human Mobility

From Human Mobility to Social Segregation: What Insights Can Social Media Data Provide?

Meiliu Wu, and Qunying Huang

In Urban Human Mobility, 2025

Abs HTML PDF

This chapter investigates how social media data can reveal links between human mobility and social segregation in urban environments. Leveraging over 37 million geotagged tweets from the top 50 most populous U.S. cities, we infer each user’s race-ethnicity and economic status to examine movement patterns across diverse sociodemographic backgrounds. By identifying individuals’ activity zones and their corresponding neighborhood characteristics, the study provides a fine-grained perspective on mobility-based segregation, offering insights beyond traditional, residence-based segregation measures. In a detailed case study of the Los Angeles–Long Beach–Anaheim urban area, we develop and apply person-based segregation indices to quantify how various racial-ethnic and economic groups interact within their daily activity spaces. Results show that racial-ethnic minorities and economically disadvantaged groups tend to have shorter travel distances and more spatially restricted activity zones, reflecting limited integration into the broader urban landscape. The findings highlight the interplay of race-ethnicity and economic status in shaping mobility patterns and experienced segregation. Although this approach is innovative, challenges remain, such as potential biases in user participation and data quality. Future research should integrate multiple data sources, refine user inference methods, and expand globally to enhance our understanding of evolving urban inequalities.
ICCV

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Siyu Chen, Ting Han, Changshe Zhang, and 4 more authors

arXiv preprint arXiv:2504.12753, 2025

Abs HTML PDF

Vision Foundation Models (VFMs) have delivered remarkable performance in Domain Generalized Semantic Segmentation (DGSS). However, recent methods often overlook the fact that visual cues are susceptible, whereas the underlying geometry remains stable, rendering depth information more robust. In this paper, we investigate the potential of integrating depth information with features from VFMs, to improve the geometric consistency within an image and boost the generalization performance of VFMs. We propose a novel fine-tuning DGSS framework, named DepthForge, which integrates the visual cues from frozen DINOv2 or EVA02 and depth cues from frozen Depth Anything V2. In each layer of the VFMs, we incorporate depth-aware learnable tokens to continuously decouple domain-invariant visual and spatial information, thereby enhancing depth awareness and attention of the VFMs. Finally, we develop a depth refinement decoder and integrate it into the model architecture to adaptively refine multi-layer VFM features and depth-aware learnable tokens. Extensive experiments are conducted based on various DGSS settings and five different datsets as unseen target domains. The qualitative and quantitative results demonstrate that our method significantly outperforms alternative approaches with stronger performance, steadier visual-spatial attention, and superior generalization ability. In particular, DepthForge exhibits outstanding performance under extreme conditions (e.g., night and snow). Code is available at this https URL: https://github.com/SY-Ch/DepthForge
Spatial Demography

Segregation: What is in a Name? A Review of Segregation Measurement and a Prospective Framework

Meiliu Wu, David WS Wong, and Qunying Huang

Spatial Demography, 2025

Abs HTML PDF

Previous studies have attempted to define segregation succinctly but have not reached a consensus. Adding to the unsettled debates was the shift from using place-based to person-based data in segregation studies. Deduced from the literature, the meanings of segregation may be summarized by the two essences of measuring the distribution of and interaction among different population groups, and measures may be characterized by four elements of segregation: dimensions, spatial extent, social extent, and data type. Besides showing a trend from using spatially aggregated place-based data to individual-level person-based data, a review of selected papers confirms the salience of distribution and interaction in measuring segregation and highlights the limitations of existing measures in capturing social interaction between population groups and the impacts of segregation, even in recent studies that utilize person-based data. Finally, we propose a segregation measurement framework to accommodate the increasingly hybrid physical-virtual world using person-based data with three components: distributions of different groups, interactions among groups, and inequality experienced by different groups due to the disparities in accessing resources and opportunities. This review paper charts directions for future studies using segregation measures.
ISPRS Annals

Advancing Mixed Land Use Detection by Embedding Spatial Intelligence into Vision-Language Models

Meiliu Wu, Qunying Huang, and Song Gao

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2025

Abs HTML PDF

Embedding spatial intelligence into vision-language models (VLMs) has offered a promising avenue to improve geospatial decision-making in complex urban environments. In this work, we propose a novel framework that augments the architecture of Contrastive Language-Image Pretraining (CLIP) with the techniques of spatial-context aware prompt engineering and spatially explicit contrastive learning. By leveraging a diverse set of geospatial imagery (e.g., street view, satellite, and map tile images), paired with contextual geospatial text generated and curated via GPT-4, our approach constructs robust multimodal representations that capture visual, textual, and spatial insights. The proposed model, termed GeospatialCLIP, is specifically evaluated for urban mixed land use detection, a critical task for sustainable urban planning and smart city development. Results demonstrate that GeospatialCLIP consistently outperforms traditional vision-based few-shot models (e.g., ResNet-152, Vision Transformers) and exhibits competitive performance with state-of-the-art models such as GPT-4. Notably, the incorporation of spatial prompts, especially those providing city-specific cues, significantly boosts detection accuracy. Our findings highlight the pivotal role of spatial intelligence in refining VLM performance and provide novel insights into the integration of geospatial reasoning within multimodal learning. Overall, this work establishes a foundation for future spatially explicit AI development and applications, paving the way for more comprehensive and interpretable models in urban analytics and beyond.

2024

JAG

BiAU-Net: Wildfire burnt area mapping using bi-temporal Sentinel-2 imagery and U-Net with attention mechanism

Tang Sui, Qunying Huang, Mingda Wu, and 2 more authors

International Journal of Applied Earth Observation and Geoinformation, 2024

Abs HTML PDF

The fusion of remote sensing and artificial intelligence, particularly deep learning, offers substantial opportunities for developing innovative methods in rapid disaster mapping and damage assessment. However, current models for wildfire burnt area detection and mapping are mostly constrained in handling imbalanced training samples with non-burnt areas oversampled, boundary areas with a mix of burnt and unburnt pixels, and regions with varying environmental contexts, leading to poor model generalizability. In response, this paper proposes a novel U-Net based model, known as BiAU-Net, which incorporates attention mechanisms and a well-designed loss function, enabling the model to focus on burnt areas and improve accuracy and efficiency, especially in detecting edges and small areas. Unlike traditional single-input U-Net models for image segmentation, the proposed BiAU-Net considers and incorporates temporal changes with two inputs, pre- and post-fire Sentinel-2 imagery, enhancing performance across diverse environmental areas. Five independent areas from different continents are selected as study cases, one for training the model and all five for testing, to demonstrate the generalizability of the proposed model. We used the Fire Disturbance Climate Change Initiative v5.1 product from the European Space Agency as a baseline for model evaluation. The experiment results indicate that BiAU-Net: (1) significantly outperformed the baseline with improvements of 11.56% in Overall Accuracy, 29.08% in Precision, 7.06% in Recall, 19.90% in F1-score, 15.44% in Balanced Accuracy, 29.90% in Kappa Coefficient, and 28.29% in Matthews Correlation Coefficient (MCC); (2) largely surpassed the performance of U-Net and its variants in most study areas; (3) demonstrated good generalizability in five testing areas across different continents; and (4) achieved the highest overall performance compared to the most state-of-the-art wildfire burnt area detection models, evidenced by the highest F1-score and MCC values.
AGILE

Occupation prediction with multimodal learning from tweet messages and Google Street View images

Xinyi Liu, Bo Peng, Meiliu Wu, and 3 more authors

AGILE: GIScience Series, 2024

Abs HTML PDF

Despite the development of various heuristic and machine learning models, social media user occupation predication remains challenging due to limited high-quality ground truth data and difficulties in effectively integrating multiple data sources in different modalities, which can be complementary and contribute to informing the profession or job role of an individual. In response, this study introduces a novel semi-supervised multimodal learning method for Twitter user occupation prediction with a limited number of training samples. Specifically, an unsupervised learning model is first designed to extract textual and visual embeddings from individual tweet messages (textual) and Google Street View images (visual), with the latter capturing the geographical and environmental context surrounding individuals’ residential and workplace areas. Next, these high-dimensional multimodal features are fed into a multilayer transfer learning model for individual occupation classification. The proposed occupation prediction method achieves high evaluation scores for identifying Office workers, Students, and Others or Jobless people, with the F1 score for identifying Office workers surpassing the best previously reported scores for occupation classification using social media data.

2023

JAG

Mixed land use measurement and mapping with street view images and spatial context-aware prompts via zero-shot multimodal learning

Meiliu Wu, Qunying Huang, Song Gao, and 1 more author

International Journal of Applied Earth Observation and Geoinformation, 2023

Abs HTML PDF

Traditional overhead imagery techniques for urban land use detection and mapping often lack the precision needed for accurate, fine-grained analysis, particularly in complex environments with multi-functional, multi-story buildings. To bridge the gap, this study introduces a novel approach, utilizing ground-level street view images geo-located at the point level, to provide more concrete, subtle, and informative visual characteristics for urban mixed land use analysis, addressing the two major limitations of overhead imagery: coarse resolution and insufficient visual information. Given that spatial context-aware land-use descriptions are commonly employed to describe urban environments, this study treats mixed land use detection as a Natural Language for Visual Reasoning (NLVR) task, i.e., classifying land use(s) in images based on the similarity of their visual characteristics and local descriptive land use contexts, by integrating street view images (vision) with spatial context-aware land use descriptions (language) through vision-language multimodal learning. The results indicate that our multimodal approach significantly outperforms traditional vision-based methods and can accurately capture the multiple functionalities of the ground features. It benefits from the incorporation of spatial context-aware prompts, whereas the geographic scale of geo-locations matters. Additionally, our approach marks a significant advancement in mixed land use mapping, achieving point-level precision. It allows for the representation of diverse land use types at point locations, offering the flexibility of mapping at various spatial resolutions, including census tracts and zoning districts. This approach is particularly effective in areas with diverse urban functionalities, facilitating a more fine-grained and detailed perspective on mixed land uses in urban settings.
CEUS

Revealing racial-ethnic segregation with individual experienced segregation indices based on social media data: A case study in Los Angeles-Long Beach-Anaheim

Meiliu Wu, Xinyi Liu, Yuehan Qin, and 1 more author

Computers, Environment and Urban Systems, 2023

Abs HTML PDF

While recent studies have started to measure the experienced racial-ethnic segregation across activity space (beyond “residential” segregation), insufficient efforts have been devoted to revealing the experienced segregation levels of the racial-ethnic minorities (e.g., Asian, Hispanic, Native, and Multi-races), mainly due to the lack of a comprehensive probe into various individual-level datasets. This issue leads to an unnoted deficiency in this field – the lack of showing the “directed” interactions between any pair of two groups. To bridge these gaps, this work proposed a unified framework of using social media data to infer both individual’s mobility patterns and user profiles (i.e., race-ethnicity and economic status), to include more racial-ethnic minorities for a more comprehensive estimation of individual experienced segregation. With these inferred information, we developed two novel individual experienced segregation indices, i.e., individual experienced exposure index (EEI) to each race-ethnicity and individual experienced diversity index (EDI). Moreover, we integrated these two indices with the spatial impacts among the mobility-based activity locations based on distance-decay functions, which are often neglected by existing studies. Using Los Angeles-Long Beach-Anaheim as the study case, we discovered several novel, important findings. First, while experienced isolation (i.e., mainly having intra-group interaction) persists among all groups, Asian is most diverse in inter-group interaction. Second, individuals who live closer tend to have similar levels of inter-group interaction. Moreover, exposures to both White and Black are negatively correlated with exposure to Hispanic. On top of that, individuals with a higher economic status are likely less interested in inter-group interaction (except for those from Hispanic), along with more exposures to both White and Black, but less exposure to Hispanic.
Remote Sensing

Near real-time flood mapping with weakly supervised machine learning

Jirapa Vongkusolkit, Bo Peng, Meiliu Wu, and 2 more authors

Remote Sensing, 2023

Abs HTML PDF

Advances in deep learning and computer vision are making significant contributions to flood mapping, particularly when integrated with remotely sensed data. Although existing supervised methods, especially deep convolutional neural networks, have proved to be effective, they require intensive manual labeling of flooded pixels to train a multi-layer deep neural network that learns abstract semantic features of the input data. This research introduces a novel weakly supervised approach for pixel-wise flood mapping by leveraging multi-temporal remote sensing imagery and image processing techniques (e.g., Normalized Difference Water Index and edge detection) to create weakly labeled data. Using these weakly labeled data, a bi-temporal U-Net model is then proposed and trained for flood detection without the need for time-consuming and labor-intensive human annotations. Using floods from Hurricanes Florence and Harvey as case studies, we evaluated the performance of the proposed bi-temporal U-Net model and baseline models, such as decision tree, random forest, gradient boost, and adaptive boosting classifiers. To assess the effectiveness of our approach, we conducted a comprehensive assessment that (1) covered multiple test sites with varying degrees of urbanization, and (2) utilized both bi-temporal (i.e., pre- and post-flood) and uni-temporal (i.e., only post-flood) input. The experimental results showed that the proposed framework of weakly labeled data generation and the bi-temporal U-Net could produce near real-time urban flood maps with consistently high precision, recall, f1 score, IoU score, and overall accuracy compared with baseline machine learning algorithms.
Geoinformatics

Measuring Access Inequality in A Hybrid Physical-Virtual World:: A Case Study of Racial Disparity of Healthcare Access During CoVID-19

Meiliu Wu, Qunying Huang, and Song Gao

In 2023 30th International Conference on Geoinformatics, 2023

Abs HTML PDF

The disparity of resource access (e.g., food and healthcare) among different population groups essentially reflects social inequalities. Emerging information and communications technology (ICT) has facilitated the teleactivities that can replace or complement traditional physical visits, yet existing approaches for measuring access disparity still fail to consider virtual interactions. To this end, this study proposes a unified framework to measure access inequality in both physical and virtual spaces simultaneously, using the POIs’ visit patterns from mobile phone data to capture spatial unevenness of accessibility among different groups (physical space), as well as the Household Pulse Survey data to reveal the group disparity of teleactivity access (virtual space). In particular, a novel Access Inequity Index is proposed based on the Information Theory Index and the Theil Index, to reveal the access inequality in physical and virtual spaces respectively. Next, to demonstrate the feasibility of the proposed framework, we examine racial groups and their disparity in physical-virtual healthcare access during the pandemic (April-July 2021) in U.S. 15 most populated metropolitans. Our results indicate: (1) race is a significant risk marker for underlying conditions that affect health, including telehealth access; (2) the usage of telehealth access aligns with the risk for COVID-19 infection, hospitalization, and death by race (e.g., the minority groups Black and Others are more vulnerable and in the higher demand of telehealth service); and (3) residential segregation impacts on the segregated pattern of physical healthcare access by race (e.g., the Black-dominant healthcare access zone highly matches the Black residential cluster in the south of Chicago), while such impacts may differ in different kinds of healthcare services (e.g., physicians, mental health practitioners, and dentists). Compared with traditional single-space approaches, the proposed hybrid-spaces approach not only provides more comprehensive and in-depth insights to understand the racial disparity in healthcare access, but also brings new opportunities to a broad scope of studies in social inequality measurement in future.
ICBDT

Pixel-wise Wildfire Burn Severity Classification with Bi-temporal Sentinel-2 Data and Deep Learning

Mingda Wu, Qunying Huang, Tang Sui, and 1 more author

In Proceedings of the 2023 6th International Conference on Big Data Technologies, 2023

Abs HTML PDF

Wildfires result in extensive damage and pose significant threats to the natural environment and human society. Accurate and timely assessment of wildfire damage is critical for effective post-fire management and restoration efforts. Much progress has been made for monitoring and mapping burned areas based on manual feature extraction and machine learning. However, current approaches are often hindered by several limitations, including poor generalizability, complex and time-consuming procedures, and a primary focus on binary classification, only distinguishing between burned and non-burned areas. As such, this paper intends to develop an advanced framework that not only can reliably detect burned areas, but also has the capability to assess the burn severity (low, medium and high) for the detected areas. Within this framework, pixel-wise multiclass models for detecting wildfire burn severity level are developed based on the standard U-Net architecture using pre- and post-fire Sentinel-2 remote sensing images. The experimental results demonstrated the power of the Bi-temporal U-Net model for both multi-class and binary mapping of wildfire detection, achieving an overall accuracy (OA) of over 95%, and outperformed baseline algorithms, including fully convolutional network, random forest, eXtreme gradient boosting, and support vector machine, with an average improvement of 18% in F1-score and 15% in mean intersection over union.

2022

SIGSPATIAL

Im2city: image geo-localization via multi-modal learning

Meiliu Wu, and Qunying Huang

In Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, 2022

Abs HTML PDF

This study investigated multi-modal learning as a stand-alone solution to image geo-localization problems. Based on the successful trials on the contrastive language-image pre-training (CLIP) model, we developed GEo-localization Multi-modal (GEM) models, which not only learn the visual features from input images, but also integrate the labels with corresponding geo-location context to generate textual features, which in turn are fused with the visual features for image geo-localization. We demonstrated that simply utilizing the image itself and appropriate contextualized prompts (i.e., mechanisms to integrate labels with geo-location context as textural features) is effective for global image geo-localization, which traditionally requires large amounts of geo-tagged images for image matching. Moreover, due to the integration of natural language, our GEM models are able to learn spatial proximity of geo-contextualized labels (i.e., their spatial closeness), which is often neglected by classification-based geo-localization methods. In particular, the proposed Zero-shot GEM model (i.e., geo-contextualized prompt tuning on CLIP) outperforms the state-of-the-art model - Individual Scene Networks (ISN), obtaining 4.1% and 49.5% accuracy improvements on the two benchmark datasets, IM2GPS3k and Place Plus 2.0 (i.e., 22k street view images across 56 cities worldwide), respectively. In addition, our proposed Linear-probing GEM model (i.e., CLIP’s image encoder linearly trained on street view images) outperforms ISN even more significantly, obtaining 16.8% and 71.0% accuracy improvements, respectively. By exploring optimal geographic scales (e.g., city-level vs. country-level), training datasets (street view images vs. random online images), and pre-trained models (e.g., ResNet vs. CLIP for linearly probing), this research sheds light on integrating textural features with visual features for image geo-localization and beyond.
Scientific Reports

Graph-based representation for identifying individual travel activities with spatiotemporal trajectories and POI data

Xinyi Liu, Meiliu Wu, Bo Peng, and 1 more author

Scientific Reports, 2022

Abs HTML PDF

Individual daily travel activities (e.g., work, eating) are identified with various machine learning models (e.g., Bayesian Network, Random Forest) for understanding people’s frequent travel purposes. However, labor-intensive engineering work is often required to extract effective features. Additionally, features and models are mostly calibrated for individual trajectories with regular daily travel routines and patterns, and therefore suffer from poor generalizability when applied to new trajectories with more irregular patterns. Meanwhile, most existing models cannot extract features to explicitly represent regular travel activity sequences. Therefore, this paper proposes a graph-based representation of spatiotemporal trajectories and point-of-interest (POI) data for travel activity type identification, defined as Gstp2Vec. Specifically, a weighted directed graph is constructed by connecting regular activity areas (i.e., zones) detected via clustering individual daily travel trajectories as graph nodes, with edges denoting trips between pairs of zones. Statistics of trajectories (e.g., visit frequency, activity duration) and POI distributions (e.g., percentage of restaurants) at each activity zone are encoded as node features. Next, trip frequency, average trip duration, and average trip distance are encoded as edge weights. Then a series of feedforward neural networks are trained to generate low-dimensional embeddings for activity nodes through sampling and aggregating spatiotemporal and POI features from their multihop neighborhoods. Activity type labels collected via travel surveys are used as ground truth for backpropagation. The experiment results with real-world GPS trajectories show that Gstp2Vec significantly reduces feature engineering efforts by automatically learning feature embeddings from raw trajectories with minimal prepossessing efforts. It not only enhances model generalizability to receive higher identification accuracy on test individual trajectories with diverse travel patterns, but also obtains better efficiency and robustness. In particular, our identification of the most common daily travel activities (e.g., Dwelling and Work) for people with diverse travel patterns outperforms state-of-the-art classification models.
Annals of GIS

Human movement patterns of different racial-ethnic and economic groups in US top 50 populated cities: What can social media tell us about isolation?

Meiliu Wu, and Qunying Huang

Annals of GIS, 2022

Abs HTML PDF

Many studies have proven that human movement patterns are strongly impacted by individual socioeconomic and demographic background. While many efforts have been made on exploring the influences of age and gender on movement patterns using social media, this study aims to analyse and compare the movement patterns among different racial-ethnic and economic groups using social media (i.e. geotagged tweets) from the U.S. top 50 populated cities. Results show that there are significant differences in number of activity zones and median travel distance across cities and demographic groups, and that power-laws tend to be captured in both spatial and demographic aspects. Additionally, the analysis of outbound-city travels demonstrates that some cities have slightly stronger interaction with others, and that economically disadvantaged populations and racial-ethnic minorities are more restricted in long distance travels, indicating that their spatial mobility is more limited to the local scale. Lastly, an economically-segregated movement pattern is discovered – upper-class neighbourhoods are mostly visited by the upper-class, while lower-class neighbourhoods are mainly accessed by the lower-class – but some racial-ethnic groups can diversify this segregated pattern in the local scale.

2017

SIGSPATIAL

The impact of MTUP to explore online trajectories for human mobility studies

Xinyi Liu, Qunying Huang, Zhenlong Li, and 1 more author

In Proceedings of the 1st ACM SIGSPATIAL Workshop on Prediction of Human Mobility, 2017

Abs HTML PDF

Social media data which capture long-term personal travel activities as a set of space-time points (time series) become widely used for human mobility study. The space-time points representing individual activities are massive and need aggregation upon time dimension (besides space dimension) to show temporal mobility patterns. During the temporal aggregations, time series are sliced into different temporal layers, and the aggregation results could be impacted by four parameters, including layer size (time interval of each tempoal layer), start placement (the start time of the first layer), amount of overlap between two consecutive layers, and time series extent (temporal scope of the datasets for aggregation). Different parameterizations result in different mobility patterns, known as the "Modifiable Temporal Unit Problem" (MTUP; on the analogy of the "Modifiable Areal Unit Problem" or MAUP). While the general effects of MTUP are well examined in previous studies, MTUP is often ignored in trajectory reconstructions using sparse social media data. To fill this research gap, this paper will explore the impact of different temporal aggregation schemas (parameterizations) on the discovery of human mobility patterns using geo-tagged tweets within a 3D geospatial analytical system. The case study reveals that MTUP is significant during the process of detecting an individual’s daily representative (regular) trajectories based on sparse online footprints. Comprehensive analysis on multiple aggregation results with different parameters could improve understanding of an individual’s regular daily travel patterns. The interactive analytical system and visualization methods proposed by this study could minimize MTUP impact and help avoid false arguments.