cross-posted from: https://lemmy.sdf.org/post/31995242

Archived

Unveiling Trae: ByteDance’s AI IDE and Its Extensive Data Collection System

Trae - the coding assistant of China’s ByteDance - has rapidly emerged as a formidable competitor to established AI coding assistants like Cursor and GitHub Copilot. Its main selling point? It’s completely free - offering Claude 3.7 Sonnet and GPT-4o without any subscription fees. Unit 221B’s technical analysis, using network traffic interception, binary analysis, and runtime monitoring, has identified a sophisticated telemetry framework that continuously transmits data to multiple ByteDance servers. From a cybersecurity perspective, this represents a complex data collection operation with significant security and privacy implications.

[…]

Key Findings:

  • Persistent connections to minimum 5 unique ByteDance domains, creating multiple data transmission vectors
  • Continuous telemetry transmission even during idle periods, indicating an always-on monitoring system
  • Regular update checks and configuration pulls from ByteDance servers, allowing for dynamic control
  • Permanent device identification via machineId parameter, which appears to be derived from hardware identifiers, enabling long-term tracking capabilities
  • Local WebSocket channels observed collecting full file content, with portions potentially transmitted to remote servers
  • Complex local microservice architecture with redundant pathways for code data, suggesting a deliberate system design
  • JWT tokens and authentication data observed in multiple communication channels, presenting potential credential exposure concerns
  • Use of binary MessagePack format observed in data transfers, adding complexity to security analysis
  • Extensive behavioral tracking mechanisms capable of building detailed user activity profiles
  • Sophisticated data segregation across multiple endpoints, consistent with enterprise-grade telemetry systems

[…]

  • Kissaki@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 days ago

    JWTs are standard authentication tools - who’s the security concern for? ByteDance? Or are you saying the JWTs are from the local machine?

    Yes, I read that as local project JWTs are being transmitted to their servers. As a concern, and not labeled as used for authentication, IMO it’s clearly implied that they observed JWT tokens and auth data unrelated to any telemetry auth (if they even have any).

    JWT tokens and authentication data observed in multiple communication channels, presenting potential credential exposure concerns