Skip to content

Commit

Permalink
Merge overleaf-2024-11-01-1353 into main
Browse files Browse the repository at this point in the history
  • Loading branch information
yamanksingla authored Nov 1, 2024
2 parents e0a4c64 + de04fef commit 99cec78
Show file tree
Hide file tree
Showing 4 changed files with 137 additions and 17 deletions.
44 changes: 35 additions & 9 deletions Conclusion.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,45 @@ \chapter{Conclusion and an Outlook for Future Work}
\label{chapter:conclusion}


This thesis has explored the intersection of communication theory, behavioral science, and artificial intelligence, with a particular focus on understanding and optimizing human behavior through large-scale modeling approaches. Our work builds upon the fundamental seven-factor model of communication—communicator, message, channel, time of receipt, receiver, time of behavior, and receiver's behavior—while leveraging unprecedented access to digital behavioral data to advance both explanatory and predictive approaches to behavioral science.
In the domain of persuasion strategy analysis, we have made significant contributions to understanding the mechanisms of influence in advertising. Through comprehensive research spanning marketing, social psychology, and machine learning literature, we developed the most extensive framework of generic persuasion strategies to date. This work was supported by the creation and release of pioneering datasets for studying persuasion strategies in both image and video advertisements. Our analysis established clear correlations between specific marketing campaign characteristics and measurable customer behaviors, providing valuable insights for both practitioners and researchers in the field of marketing communications.
Our development of Large Content and Behavior Models (LCBMs) represents a fundamental advancement in behavior modeling. Through careful analysis, we revealed that existing Large Language Models (LLMs), despite their remarkable capabilities in various domains, are inherently limited in modeling behavior due to the systematic removal of behavioral data during training. To address this limitation, we developed the LCBM approach, which integrates all seven factors of communication to create more comprehensive models of human behavior. To support future research in this area, we released extensive behavior instruction fine-tuning data derived from over 40,000 YouTube videos and 168 million Twitter posts. Additionally, we established new benchmarks for evaluating joint content-behavior understanding, encompassing both predictive and descriptive tasks.
The thesis has also made significant strides in demonstrating how behavioral signals can enhance content understanding. Our research showed substantial improvements across 46 different tasks spanning 23 benchmark datasets across language, audio, text, and video modalities. We proposed a scalable approach to enhance Vision Language Models (VLMs) without requiring significant architectural changes, making our improvements readily accessible to the broader research community. These results strongly validate our hypothesis that behavioral responses provide valuable signals for content understanding, opening new avenues for improving AI systems' comprehension capabilities.
In the realm of content generation, we made pioneering contributions in both text and visual domains. Through our work on memorability optimization, we developed Henry, which achieved a 44\% improvement in memorability scores through progressive generation techniques. This represents the first successful application of synthetic data in a domain previously lacking large-scale training resources. In the visual domain, we addressed the critical need for engagement-optimized image generation through the development of EngageNet and the creation of EngagingImageNet, a comprehensive dataset of 168 million tweets with associated media and engagement metrics. Our introduction of Engagement Arena, the first automated benchmark for assessing the engagement potential of text-to-image models, provides the research community with a valuable tool for evaluating and improving engagement-oriented image generation techniques.
Looking ahead, this research opens several promising directions for future work. The integration of behavioral data into AI systems could lead to more nuanced and context-aware models that better understand and predict human responses. There is significant potential for extending our approaches to other domains and modalities, particularly in areas where human engagement and response are crucial metrics of success. Additionally, our work on content generation optimization could be expanded to consider multiple behavioral objectives simultaneously, creating content that is not only engaging but also informative, memorable, and persuasive.
Finally, as we stand at the cusp of what we identified as the fourth major phase in the study of communication, driven by unprecedented access to digital content and behavioral data, our work provides a foundation for future researchers to build upon. The tools and methodologies we have developed demonstrate the potential for artificial intelligence to advance our understanding of human behavior and communication, while also highlighting the importance of maintaining a holistic view that encompasses all aspects of the communication process. As these technologies continue to evolve, they promise to provide even deeper insights into human behavior and more effective means of optimizing communication for various objectives.
Our contributions not only advance the field of behavioral science but also provide practical tools and insights for practitioners in marketing, content creation, and communication. By bridging the gap between theoretical understanding and practical application, this thesis lays groundwork for future innovations in both academic research and real-world applications of behavioral science and artificial intelligence.
This thesis has explored the intersection of communication theory, behavioral science, and artificial intelligence, with a particular focus on explaining, understanding, and optimizing human behavior through large-scale modeling approaches. Our work builds upon the fundamental seven-factor model of communication—communicator, message, channel, time of receipt, receiver, time of behavior, and receiver's behavior—while leveraging unprecedented access to digital behavioral data to advance both explanatory and predictive approaches to behavioral science.
In the domain of persuasion strategy analysis, we have made significant contributions to understanding the mechanisms of influence in advertising. Through comprehensive research spanning marketing, social psychology, and machine learning literature, we developed the most extensive framework of generic persuasion strategies to date. This work was supported by the creation and release of the first datasets for studying persuasion strategies in both image and video advertisements.


We discover that existing Large Language Models (LLMs), despite their remarkable capabilities in various domains, are inherently limited in modeling behavior due to the systematic removal of behavioral data during training. To address this limitation, we developed the Large Content and Behavior Models (LCBM), which integrates all seven factors of communication to create more comprehensive models of human behavior. To support future research in this area, we released extensive behavior instruction fine-tuning data derived from over 40,000 YouTube videos and 168 million Twitter posts. Additionally, we established new benchmarks for evaluating joint content-behavior understanding, encompassing both predictive and descriptive tasks.


We also made significant strides in demonstrating how behavioral signals can enhance content understanding. Our research showed substantial improvements across 46 different tasks spanning 23 benchmark datasets across language, audio, text, and video modalities. We proposed a scalable approach to enhance Vision Language Models (VLMs) without requiring significant architectural changes, making our improvements readily accessible to the broader research community. These results strongly validate our hypothesis that behavioral responses provide valuable signals for content understanding, opening new avenues for improving AI systems' comprehension capabilities.

XXX

In the realm of content generation, we made contributions towards generating performant content in both text and visual domains. Through our work on memorability optimization, we developed Henry, a model achieving a 44\% improvement in memorability scores of the generated content from the starting point. This represents the first successful application of synthetic data in a domain previously lacking large-scale training resources. In the visual domain, we addressed the critical need for engagement-optimized image generation through the development of EngageNet and the creation of EngagingImageNet, a comprehensive dataset of 168 million tweets with associated media and engagement metrics. Our introduction of Engagement Arena, the first automated benchmark for assessing the engagement potential of text-to-image models, provides the research community with a valuable tool for evaluating and improving engagement-oriented image generation techniques.


Looking ahead, this research opens several promising directions for future work. The integration of behavioral data into AI systems could lead to more nuanced and context-aware models that better understand and predict human responses. Concretely, we visualize the following avenues for automated behavioral sciences in the near future:
\begin{enumerate}
\item \textbf{Infinite Personalization}: Before the invention of the printing press, each document had to be written with manual effort. Content production was the limiting factor in communication. The invention of the printing press made it possible to mass-produce content. However, delivery was still limited. While newspapers began to be printed, their area of influence was limited to a certain small geographical boundary. Delivery was the limiting factor then. Steam engines helped solve some of that problem. Still, the extent of delivery was limited, and the speed of delivery was slow. It was not until the invention of the internet and mobile devices that the delivery problem was completely solved. Now, anyone can instantly deliver any piece of content to any other person. The next limiting factor in communication is the time and human labor cost of producing content. This limits a communicator to send out the same message to all the receivers. Further, as both ours and several other research studies have shown, humans are bad at predicting the behavior of others; we need techniques to produce performant content. This will enable infinite personalization, a personalized way of communicating between a communicator and a receiver, with the aim of fulfilling the shared goals.



\item \textbf{Simulating Digital Humans and Digital Societies}: At the heart of social simulation lie two perspectives \cite{gilbert2005simulation}: 1) the dynamic feedback or interaction among individuals, and 2) the states of the population, either as a collective whole or as distinct groups. By simulating social activities, researchers and practitioners can predict the future evolution of individuals and groups. In addition, they facilitate experimental environments through interventions. Social simulation can be implemented in two forms: digital humans \cite{park2023generative,chopard1998cellular,Argyle_2023} and digital societies \cite{khandelwal2023large,bhattacharyyasocia2024,si2023long,khurana2023behavior,santurkar2023whose}. In digital human simulation, either human-crafted rules or parameterized models are used to depict the behavior of individuals (referred to as agents) who interact with others, in societal simulation, equations or models are used to model the society as a whole including the societal non-linear interactions. The key to building these simulation models lies in leveraging the vast digital footprint left by these observable factors. Both physical and digital interactions contain these signals. For instance, consider a physical political banner displayed by the political campaign of Kamala Harris saying ``For The People'' in a busy city such as San Francisco and viewed by office-goers, receiving various reactions such as hopeful comments, visible disdain, or cold indifference. Analogously in the digital domain, a tweet by a figure like Donald Trump saying ``Make America Great Again'' receives likes, retweets, and comments, whether positive or negative. However, digital signals are far more accessible and recorded in structured datasets, making them ideal for training a Foundation Model. Digital Analytics have been recording such digital signals for decades. Digital analytics involves collecting, analyzing, and interpreting data from digital platforms to capture user behavior. This data typically includes messages sent by a marketer in the form of websites, apps, or digital products and records actions such as clicks, page views, session durations, and navigation patterns, which provide insights into user behavior over a period of time. We have made some initial strides towards achieving this in our recent work \cite{bhattacharyyasocia2024}.


\item \textbf{Measuring persuasiveness and engagement potential of automated agents}: Large Language Models (LLMs) have demonstrated proficiency in content generation and, more recently, in human persuasion through the production of persuasive content \cite{durmus2024persuasion,singh2024measuring}. The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On one hand, such systems could positively impact domains like advertising and social good, such as addressing vaccine hesitancy \cite{sekar2021domestic,PRWeek_DefeatDespairCOVID19}. Conversely, these systems could have detrimental effects if used to influence political inclinations \cite{tappin2023quantifying}, propagate misinformation \cite{lukito2020coordinating}, or manipulate consumer choices \cite{boerman2017online}. Given these potential societal impacts, it is crucial to develop rigorous methods for studying, measuring, benchmarking, and monitoring the persuasive capabilities of AI models. We have made some initial strides towards achieving this in our recent works \cite{singh2024measuring,khurana2023behavior}.


\item \textbf{Automatically explaining human behavior}: While the behavioral science communities are divided into prediction and explanation, and the communities are growing farther apart, the fundamental curiosity of humans is to learn more about themselves and their environment and how it operates. While predictions may be increasingly more and more accurate, if the mechanism is not well understood, the fundamental human curiosity is not satisfied. As a community, our ongoing commitment is to uncover the mechanisms underlying human behavior. However, we have to discover methods that carry both higher predictive power and are scalable. This may be solved in the future by using advanced tools such as simulations and data from natural experiments to bridge the gap between prediction and explanation.


% \item \textbf{Increasing use of natural experiments}: Most behavioral science still relies on causal study designs and experimentation. However, causal studies are limited because of the amount of data \cite{dunning2012natural,tan2014effect,singh2024measuring}
\end{enumerate}



Finally, as we stand at the cusp of what we identified as the fourth major phase in the study of communication, driven by unprecedented access to digital content and behavioral data, we should remember these sayings:

\textit{We're actually much better at planning the flight path of an interplanetary rocket (rocket science) than we are at managing the economy, merging two corporations, or even predicting how many copies of a book will sell (behavior prediction). So why is it that rocket science \textbf{seems} hard, whereas problems having to do with people - which arguably are much harder - seem like they ought to be \textbf{just} a matter of common sense (easily predictable)?} - Duncan J. Watts

\begin{center}
And,
\end{center}

\textit{Nothing in Nature is random (unpredictable). A thing appears random only through the incompleteness of our knowledge (ignorance).} - Baruch Spinoza
Loading

0 comments on commit 99cec78

Please sign in to comment.