Update README.md

Version 3.1
Update telegram-scraper.py
2026-04-11 23:38:09 +02:00 · 2025-12-12 15:38:09 +01:00 · 2025-09-11 17:34:59 +02:00 · 2025-09-11 17:34:30 +02:00 · 2025-09-11 17:32:56 +02:00 · 2025-07-20 20:18:12 +02:00
3 changed files with 1055 additions and 407 deletions
--- a/README.md
+++ b/README.md
@@ -1,5 +1,17 @@
 # Telegram Channel Scraper 📱

+> **⚠️ DISCONTINUED**
+>
+> This project is no longer maintained. After a lot of support and interest from the community, A far more capable successor has been released:
+>
+> **➜ [Harrier — Telegram Scraping & Intelligence Platform](https://github.com/skuggrev/harrier)**
+>
+> Harrier has everything this tool had and much more - web UI, real-time progress, user lookup, webhook alerts, continuous scraping, and a proper export system. I recommend switching over.
+>
+> A huge thank you to everyone who used, starred, and supported this project.
+
+---
+
 A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.

 ```
@@ -10,17 +22,43 @@ ___________________  _________
  |____|  \______  /_______  /
                 \/        \/
 ```
+
+## What's New in v3.1 🎉
+
+**Enhanced Message Data:**
+
+- **Message statistics** - Captures views, forwards, and post_author for each message
+- **Reactions support** - Records all emoji reactions with counts (e.g., "😀 12 👍 3")
+- **Automatic database migration** - Seamlessly adds new columns to existing databases
+- **Richer exports** - All new data included in CSV/JSON exports
+
+**Improved Channel Management:**
+
+- **Channel names displayed** - Shows channel names alongside IDs everywhere
+- **Smart filtering** - List option now only shows Channels and Groups (no private chats)
+- **channels_list.csv export** - Automatically saves channel list with names, IDs, usernames, and types
+- **"all" selection** - Quickly add all listed channels at once
+- **Better export naming** - Files now named as `ID_username.csv` and `ID_username.json`
+
+**Bug Fixes:**
+
+- **Fixed channel ID parsing** - Resolved "invalid literal for int()" error in fix missing media
+- **Better entity resolution** - Handles both numeric IDs and channel usernames
+- **Improved error messages** - Shows channel names with IDs for clearer debugging
+
 ## Features 🚀

- Scrape messages from multiple Telegram channels
- Download media files (photos, documents)
+- **QR Code & Phone Authentication** - Choose your preferred login method
+- Scrape messages with full metadata (views, forwards, reactions, post author)
+- Download media files with parallel processing and unique naming
 - Real-time continuous scraping
- Export data to JSON and CSV formats
- SQLite database storage
+- Export data to JSON and CSV formats with enhanced metadata
+- SQLite database storage with automatic schema migration
 - Resume capability (saves progress)
- Media reprocessing for failed downloads
- Progress tracking
- Interactive menu interface
+- Interactive menu with channel names and numbered selection
+- Smart channel filtering (only shows channels/groups)
+- Progress tracking with visual progress bars
+- Automatic channels list export to CSV

 ## Prerequisites 📋

@@ -36,13 +74,6 @@ Before running the script, you'll need:
 pip install -r requirements.txt
 ```

-Contents of `requirements.txt`:
-```
-telethon
-aiohttp
-asyncio
-```
-
 ## Getting Telegram API Credentials 🔑

 1. Visit https://my.telegram.org/auth
@@ -63,134 +94,123 @@ Keep these credentials safe, you'll need them to run the script!
 ## Setup and Running 🔧

 1. Clone the repository:
+
 ```bash
 git clone https://github.com/unnohwn/telegram-scraper.git
 cd telegram-scraper
 ```

 2. Install requirements:
+
 ```bash
 pip install -r requirements.txt
 ```

 3. Run the script:
+
 ```bash
 python telegram-scraper.py
 ```

 4. On first run, you'll be prompted to enter:
-   - Your API ID
-   - Your API Hash
-   - Your phone number (with country code)
-   - Your phone number (with country code) or bot, but use the phone number option when prompted second time.
-   - Verification code (sent to your Telegram)
-
-## Initial Scraping Behavior 🕒
-
-When scraping a channel for the first time, please note:
-
- The script will attempt to retrieve the entire channel history, starting from the oldest messages
- Initial scraping can take several minutes or even hours, depending on:
-  - The total number of messages in the channel
-  - Whether media downloading is enabled
-  - The size and number of media files
-  - Your internet connection speed
-  - Telegram's rate limiting
- The script uses pagination and maintains state, so if interrupted, it can resume from where it left off
- Progress percentage is displayed in real-time to track the scraping status
- Messages are stored in the database as they are scraped, so you can start analyzing available data even before the scraping is complete
+   - Your API ID (from my.telegram.org)
+   - Your API Hash (from my.telegram.org)
+   - **Choose authentication method:**
+     - **QR Code** (Recommended) - Scan with your phone (no phone number needed)
+     - **Phone Number** - Traditional SMS verification

 ## Usage 📝

-The script provides an interactive menu with the following options:
+The script provides a clean interactive menu:

- **[A]** Add new channel
-  - Enter the channel ID or channelname
- **[R]** Remove channel
-  - Remove a channel from scraping list
- **[S]** Scrape all channels
-  - One-time scraping of all configured channels
- **[M]** Toggle media scraping
-  - Enable/disable downloading of media files
- **[C]** Continuous scraping
-  - Real-time monitoring of channels for new messages
- **[E]** Export data
-  - Export to JSON and CSV formats
- **[V]** View saved channels
-  - List all saved channels
- **[L]** List account channels
-  - List all channels with ID:s for account
- **[Q]** Quit
+```
+========================================
+           TELEGRAM SCRAPER
+========================================
+[S] Scrape channels
+[C] Continuous scraping  
+[M] Media scraping: ON
+[L] List & add channels
+[R] Remove channels
+[E] Export data
+[T] Rescrape media
+[Q] Quit
+========================================
+```

-### Channel IDs 📢
+### Channel Selection Made Easy 🔢

-You can use either:
- Channel username (e.g., `channelname`)
- Channel ID (e.g., `-1001234567890`)
+Instead of typing long channel IDs, use numbers:
+
+**Adding Channels:**
+
+```
+[1] Tech News (ID: -1002116176890, Type: Channel, Username: @technews)
+[2] Python Dev (ID: -1001597139842, Type: Group, Username: @pythondev)
+[3] Daily Updates (ID: -1002274713954, Type: Channel, Username: @dailyupdates)
+
+Enter: 1,3 (adds channels 1 and 3)
+Or: all (adds all listed channels)
+```
+
+**Viewing Your Channels:**
+
+```
+[1] Tech News (ID: -1002116176890), Last Message ID: 5234, Messages: 12450
+[2] Python Dev (ID: -1001597139842), Last Message ID: 8192, Messages: 45782
+```
+
+**Scraping Channels:**
+
+- Single: `1`
+- Multiple: `1,3,5`
+- All: `all`
+- Mix formats: `1,-1001597139842,3`

 ## Data Storage 💾

 ### Database Structure

 Data is stored in SQLite databases, one per channel:
+
 - Location: `./channelname/channelname.db`
- Table: `messages`
-  - `id`: Primary key
-  - `message_id`: Telegram message ID
-  - `date`: Message timestamp
-  - `sender_id`: Sender's Telegram ID
-  - `first_name`: Sender's first name
-  - `last_name`: Sender's last name
-  - `username`: Sender's username
-  - `message`: Message text
-  - `media_type`: Type of media (if any)
-  - `media_path`: Local path to downloaded media
-  - `reply_to`: ID of replied message (if any)
+- Optimized with indexes for fast queries
+- WAL mode for better performance
+- Schema includes: message_id, date, sender info, message text, media info, reply_to, post_author, views, forwards, reactions
+- Automatic migration adds new columns to existing databases

 ### Media Storage 📁

-Media files are stored in:
+Media files are stored with unique naming:
+
 - Location: `./channelname/media/`
- Files are named using message ID or original filename
+- Format: `{message_id}-{unique_id}-{original_name}.ext`
+- **No more file overwrites** - Each file gets a unique name

 ### Exported Data 📊

-Data can be exported in two formats:
-1. **CSV**: `./channelname/channelname.csv`
-   - Human-readable spreadsheet format
-   - Easy to import into Excel/Google Sheets
+Export formats:

-2. **JSON**: `./channelname/channelname.json`
-   - Structured data format
-   - Ideal for programmatic processing
+1. **CSV**: `./channelname/channelid_username.csv`
+2. **JSON**: `./channelname/channelid_username.json`
+3. **Channel List**: `./channels_list.csv` (automatically created when using [L] option)

-## Features in Detail 🔍
+All exports include complete message metadata: views, forwards, reactions, and post author information.

-### Continuous Scraping
+## Performance Features ⚙️

-The continuous scraping feature (`[C]` option) allows you to:
- Monitor channels in real-time
- Automatically download new messages
- Download media as it's posted
- Run indefinitely until interrupted (Ctrl+C)
- Maintains state between runs
-
-### Media Handling
-
-The script can download:
- Photos
- Documents
- Other media types supported by Telegram
- Automatically retries failed downloads
- Skips existing files to avoid duplicates
+- **5 concurrent downloads** for faster media processing
+- **Batch database operations** for optimal speed
+- **Progress bars** with real-time feedback
+- **Resume capability** - Continue where you left off
+- **Memory-efficient** exports for large datasets

 ## Error Handling 🛠️

-The script includes:
- Automatic retry mechanism for failed media downloads
- State preservation in case of interruption
- Flood control compliance
- Error logging for failed operations
+- Automatic retry with exponential backoff
+- Rate limit compliance
+- Network error recovery
+- State preservation during interruptions

 ## Limitations ⚠️

@@ -198,10 +218,6 @@ The script includes:
 - Can only access public channels or channels you're a member of
 - Media download size limits apply as per Telegram's restrictions

-## Contributing 🤝
-
-Contributions are welcome! Please feel free to submit a Pull Request.
-
 ## License 📄

 This project is licensed under the MIT License - see the LICENSE file for details.
@@ -209,6 +225,7 @@ This project is licensed under the MIT License - see the LICENSE file for detail
 ## Disclaimer ⚖️

 This tool is for educational purposes only. Make sure to:
+
 - Respect Telegram's Terms of Service
 - Obtain necessary permissions before scraping
 - Use responsibly and ethically
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,15 @@
-telethon
-aiohttp
-asyncio
+aiohappyeyeballs==2.6.1
+aiohttp==3.12.14
+aiosignal==1.4.0
+asyncio==3.4.3
+attrs==25.3.0
+frozenlist==1.7.0
+idna==3.10
+multidict==6.6.3
+propcache==0.3.2
+pyaes==1.6.1
+pyasn1==0.6.1
+qrcode==8.0
+rsa==4.9.1
+Telethon==1.40.0
+yarl==1.20.1
--- a/telegram-scraper.py
+++ b/telegram-scraper.py
Author	SHA1	Message	Date
𝓾𝓷𝓷𝓸𝓱𝔀𝓷	c84141674a	Update README.md	2026-04-11 23:38:09 +02:00
𝓾𝓷𝓷𝓸𝓱𝔀𝓷	fb7ad3742e	Version 3.1	2025-12-12 15:38:09 +01:00
𝓾𝓷𝓷𝓸𝓱𝔀𝓷	8d4e092b1b	Update telegram-scraper.py v3.0	2025-09-11 17:34:59 +02:00
𝓾𝓷𝓷𝓸𝓱𝔀𝓷	e7bf2b1ed7	Update requirements.txt	2025-09-11 17:34:30 +02:00
𝓾𝓷𝓷𝓸𝓱𝔀𝓷	7db46018ce	Update README.md	2025-09-11 17:32:56 +02:00
Robert Aitch	65b221ade6	Update requirements.txt	2025-07-20 20:18:12 +02:00
Robert Aitch	ac7d6de06b	Performance improvements major performance overhaul with 5-10x speed improvements	2025-07-20 00:57:54 +02:00
Robert Aitch	57bf125ca1	Delete gai.py	2025-07-20 00:36:53 +02:00
Robert Aitch	f383f222c4	Update README.md	2025-07-20 00:35:41 +02:00
𝓾𝓷𝓷𝓸𝓱𝔀𝓷	6273c9c11c	Merge pull request #12 from chaseyoungcn/main get total_messages speed up	2025-07-18 10:33:00 +02:00
fxxk-research	85d3f0f935	Rename gai to gai.py rename	2025-06-26 13:36:58 +08:00
fxxk-research	30bda684fe	Update gai filiter no message channel	2025-06-26 13:36:15 +08:00
fxxk-research	aa9b756d37	Create gai 完善进度条、日志系统	2025-06-23 11:03:53 +08:00
fxxk-research	6baf4bdd13	get total_messages speed up O(n) -> O(1)	2025-06-19 20:42:10 +08:00