Exploratory Data Analysis of the Scraped Horoscopes
We conducted exploratory data analysis on the 33,142 scraped horoscopes.

The shortest horoscope is for Cancer zodiac sign:
And for Capricorn:
The longest horoscope is for Aries:
Sentiment Analysis
The model used for sentiment classification is distilroberta-finetuned-financial-news-sentiment-analysis.Overall Distribution: Neutral sentiment dominates (68.6%), followed by positive sentiment (21.9%), and negative sentiment (9.5%). Horoscopes generally aim to provide balanced or neutral content, avoiding strong negativity to maintain reader engagement and satisfaction. Positive content may be included to uplift readers, while negative content is minimal to prevent alienation.

Sentiment by Source:
- Source №3 has the highest proportion of positive sentiment (69.7%) and the lowest neutral sentiment (24.2%). Sources like №3 may focus on optimism and encouragement, aligning with their audience's expectations for uplifting content.
- Source №7 has the highest proportion of neutral sentiment (83.9%) and the lowest positive sentiment (10.4%). №7 could emphasize neutrality to reflect the interpretive and ambiguous nature of tarot card readings.
- Sources №1 and №5 are relatively balanced, with a notable emphasis on positivity (~27% positive).
- Sources №4 and №6 skew more neutral (~79.5% and 73.2%, respectively). №4 and №6 might prioritize neutrality to appeal to a broader audience, avoiding strong stances that could alienate readers.

Sentiment distribution across Zodiac Signs is remarkably consistent:
- Neutral sentiment ranges from 67.3% (Virgo) to 69.9% (Pisces).
- Positive sentiment is highest for Leo (23.3%) and Virgo (22.8%).
- Negative sentiment is slightly higher for Scorpio (10.2%) and Sagittarius (10.0%). The consistent sentiment distribution suggests that horoscope content creators aim for uniformity to avoid favoritism or bias among zodiac signs. Slight variations in positivity and negativity might reflect stereotypical traits associated with signs (e.g., Leo's confidence and optimism aligning with higher positivity).

Sentiment distribution across months:
- December has the highest proportion of positive sentiment (25.6%) and the lowest negative sentiment (7.3%). The high positivity in December could be attributed to the festive holiday season, where uplifting content resonates with readers.
- July has the lowest positive sentiment (18.6%) and the highest neutral sentiment (71.3%). July might reflect a period of introspection or neutrality, possibly tied to mid-year assessments or summer vacations.
- January shows higher negativity (10.9%) compared to other months. Higher negativity in January could align with the post-holiday blues or the pressure of New Year's resolutions.

Key Takeaways:
- Neutral sentiment is a strategic choice for maintaining inclusivity and appeal across a diverse audience.
- Sentiment shifts slightly with the calendar, reflecting cultural and psychological factors tied to specific months.
- Each source adopts a unique tone, catering to their target audience, with some emphasizing positivity (e.g., №3) and others focusing on neutrality (e.g., №7).
Category Classification
The model used for categories classification we used the script with BAAI/bge-reranker-v2-m3FlagReranker to classify horoscopes into 6 categories (love and relationships, work and career, health and emotional well-being, finances, creativity and innovation, general life advice or timing). Reranker computes inner product (similarity score) between horoscope text and each of the categories queries, and the query with the highest score becomes the category for the horoscope.
Overall Category Distribution:
- Love and relationships dominate (32.5%), followed by work and career (24.3%) and general life advice or timing (19.9%). The prominence of "love and relationships" reflects its universal appeal and relevance to readers, as horoscopes often cater to personal and emotional concerns. "Work and career" follows closely, highlighting the importance of professional guidance for readers.
- Health and emotional well-being (12.5%) and finances (6.1%) are less emphasized, with creativity and innovation being the least common (4.6%). The lower emphasis on "finances" and "creativity and innovation" suggests these areas might be considered niche or less critical for general horoscope readers.

Categories by Source:
- №3 focuses heavily on work and career (36.4%) and love and relationships (28.8%), with no mentions of finances.
- №4 prioritizes love and relationships (41.9%), with a balanced emphasis on health and emotional well-being (13.8%) and general life advice (13.1%). №3 and №4 target readers looking for career and relationship guidance, aligning with their respective audiences' expectations.
- №7 emphasizes health and emotional well-being (22.6%) and love and relationships (36.2%), with minimal focus on finances (1.6%). №7 emphasizes emotional and mental well-being, consistent with its mystical and introspective nature.
- №6 is unique in prioritizing general life advice or timing (39.6%), with less focus on love and relationships (25.2%).

Categories by Zodiac Sign. The consistency across signs indicates a deliberate effort to provide balanced guidance to all zodiac groups. Slight variations align with traditional astrological traits, such as Libra's focus on relationships and Capricorn's association with career ambition:
- Love and relationships dominate across all zodiac signs, ranging from 28.5% (Capricorn) to 35.4% (Libra).
- Work and career is consistently second, with Capricorn (26.8%) and Virgo (25.6%) showing the highest proportions.
- Health and emotional well-being is slightly higher for Pisces (14.6%) and Cancer (12.8%), potentially reflecting their emotional and nurturing archetypes. The consistency across signs indicates a deliberate effort to provide balanced guidance to all zodiac groups. Slight variations align with traditional astrological traits, such as Libra's focus on relationships and Capricorn's association with career ambition.

Categories by Month. Seasonal trends in horoscope categories align with cultural and psychological patterns.
- Love and relationships peak in February (33.9%), reflecting the influence of Valentine's Day.
- Work and career is emphasized in January (25.6%) and the start of the year, aligning with New Year's resolutions and career planning.
- General life advice or timing is highest in March (21.4%) and April (23.1%), suggesting a focus on transitions during spring.
- Finances peak in December (7.3%), likely tied to end-of-year financial planning and holiday spending.

Key Takeaways:
- Universal Themes Dominate, "Love and relationships" and "work and career" are the most common categories, appealing to readers' core concerns.
- Different sources emphasize unique themes, catering to their target audience's preferences.
- Categories shift with the calendar year, reflecting societal and cultural patterns (e.g., love in February, career in January).
- While category proportions are consistent across zodiac signs, slight variations align with astrological stereotypes.