198 lines
5.5 KiB
Markdown
198 lines
5.5 KiB
Markdown
# Telegram Channel Scraper 📱
|
|
|
|
A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.
|
|
|
|
```
|
|
___________________ _________
|
|
\__ ___/ _____/ / _____/
|
|
| | / \ ___ \_____ \
|
|
| | \ \_\ \/ \
|
|
|____| \______ /_______ /
|
|
\/ \/
|
|
```
|
|
|
|
## What's New in v3.0 🎉
|
|
|
|
**QR Code Authentication:**
|
|
- **No phone number required** - Login with QR code scanning (still need API credentials)
|
|
- **Faster authentication** - Just scan with your phone after API setup
|
|
- **Secure login** - Recommended authentication method
|
|
- **2FA support** for both QR and phone methods
|
|
|
|
**Enhanced User Experience:**
|
|
- **Numbered channel selection** - Use 1,2,3 instead of full channel IDs
|
|
- **Multi-channel operations** - Add, remove, and scrape multiple channels at once
|
|
- **Streamlined menu** - Cleaner interface with fewer redundant options
|
|
- **Progress bars** for media downloads with visual feedback
|
|
|
|
**Media Download Improvements:**
|
|
- **Fixed file overwriting** - Unique naming prevents media files from being overwritten
|
|
- **5x concurrent downloads** - Increased from 3 to 5 for faster media processing
|
|
- **Better error handling** - Improved retry logic and recovery
|
|
|
|
**Performance & Stability:**
|
|
- **Database optimizations** - WAL mode and faster operations
|
|
- **Hidden warnings** - Cleaner output without technical messages
|
|
- **Better error recovery** - More robust handling of network issues
|
|
|
|
## Features 🚀
|
|
|
|
- **QR Code & Phone Authentication** - Choose your preferred login method
|
|
- Scrape messages from multiple Telegram channels
|
|
- Download media files with parallel processing and unique naming
|
|
- Real-time continuous scraping
|
|
- Export data to JSON and CSV formats
|
|
- SQLite database storage with optimized performance
|
|
- Resume capability (saves progress)
|
|
- Interactive menu with numbered channel selection
|
|
- Progress tracking with visual progress bars
|
|
|
|
## Prerequisites 📋
|
|
|
|
Before running the script, you'll need:
|
|
|
|
- Python 3.7 or higher
|
|
- Telegram account
|
|
- API credentials from Telegram
|
|
|
|
### Required Python packages
|
|
|
|
```
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Getting Telegram API Credentials 🔑
|
|
|
|
1. Visit https://my.telegram.org/auth
|
|
2. Log in with your phone number
|
|
3. Click on "API development tools"
|
|
4. Fill in the form:
|
|
- App title: Your app name
|
|
- Short name: Your app short name
|
|
- Platform: Can be left as "Desktop"
|
|
- Description: Brief description of your app
|
|
5. Click "Create application"
|
|
6. You'll receive:
|
|
- `api_id`: A number
|
|
- `api_hash`: A string of letters and numbers
|
|
|
|
Keep these credentials safe, you'll need them to run the script!
|
|
|
|
## Setup and Running 🔧
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/unnohwn/telegram-scraper.git
|
|
cd telegram-scraper
|
|
```
|
|
|
|
2. Install requirements:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Run the script:
|
|
```bash
|
|
python telegram-scraper.py
|
|
```
|
|
|
|
4. On first run, you'll be prompted to enter:
|
|
- Your API ID (from my.telegram.org)
|
|
- Your API Hash (from my.telegram.org)
|
|
- **Choose authentication method:**
|
|
- **QR Code** (Recommended) - Scan with your phone (no phone number needed)
|
|
- **Phone Number** - Traditional SMS verification
|
|
|
|
## Usage 📝
|
|
|
|
The script provides a clean interactive menu:
|
|
|
|
```
|
|
========================================
|
|
TELEGRAM SCRAPER
|
|
========================================
|
|
[S] Scrape channels
|
|
[C] Continuous scraping
|
|
[M] Media scraping: ON
|
|
[L] List & add channels
|
|
[R] Remove channels
|
|
[E] Export data
|
|
[T] Rescrape media
|
|
[Q] Quit
|
|
========================================
|
|
```
|
|
|
|
### Channel Selection Made Easy 🔢
|
|
|
|
Instead of typing long channel IDs, use numbers:
|
|
|
|
**Adding Channels:**
|
|
```
|
|
[1] The News (Chat) (id: -1002116176890)
|
|
[2] Python Channel (id: -1001597139842)
|
|
[3] The Corner (id: -1002274713954)
|
|
|
|
Enter: 1,3 (adds channels 1 and 3)
|
|
```
|
|
|
|
**Scraping Channels:**
|
|
- Single: `1`
|
|
- Multiple: `1,3,5`
|
|
- All: `all`
|
|
- Mix formats: `1,-1001597139842,3`
|
|
|
|
## Data Storage 💾
|
|
|
|
### Database Structure
|
|
|
|
Data is stored in SQLite databases, one per channel:
|
|
- Location: `./channelname/channelname.db`
|
|
- Optimized with indexes for fast queries
|
|
- WAL mode for better performance
|
|
|
|
### Media Storage 📁
|
|
|
|
Media files are stored with unique naming:
|
|
- Location: `./channelname/media/`
|
|
- Format: `{message_id}-{unique_id}-{original_name}.ext`
|
|
- **No more file overwrites** - Each file gets a unique name
|
|
|
|
### Exported Data 📊
|
|
|
|
Export formats:
|
|
1. **CSV**: `./channelname/channelname.csv`
|
|
2. **JSON**: `./channelname/channelname.json`
|
|
|
|
## Performance Features ⚙️
|
|
|
|
- **5 concurrent downloads** for faster media processing
|
|
- **Batch database operations** for optimal speed
|
|
- **Progress bars** with real-time feedback
|
|
- **Resume capability** - Continue where you left off
|
|
- **Memory-efficient** exports for large datasets
|
|
|
|
## Error Handling 🛠️
|
|
|
|
- Automatic retry with exponential backoff
|
|
- Rate limit compliance
|
|
- Network error recovery
|
|
- State preservation during interruptions
|
|
|
|
## Limitations ⚠️
|
|
|
|
- Respects Telegram's rate limits
|
|
- Can only access public channels or channels you're a member of
|
|
- Media download size limits apply as per Telegram's restrictions
|
|
|
|
## License 📄
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
|
|
## Disclaimer ⚖️
|
|
|
|
This tool is for educational purposes only. Make sure to:
|
|
- Respect Telegram's Terms of Service
|
|
- Obtain necessary permissions before scraping
|
|
- Use responsibly and ethically
|
|
- Comply with data protection regulations
|