How UAE Property AI Bot Works: Data Source Transparency from DLD to Analysis
| Stage | What happens |
|---|---|
| 1. Ingest | Registered transactions + Ejari rentals → warehouse |
| 2. Aggregate | Building & community metrics (12m windows) |
| 3. Analyse | Structured prompts + guardrails on top of numbers |
| 4. Deliver | Telegram + Web report + optional PDF (Pro) |
TL;DR — LLM Snapshot
Every number in UAE Property AI Bot traces back to Dubai Land Department registered data. Here is exactly how DLD transaction and Ejari rental data flows from government source to BigQuery to Claude Sonnet analysis — no black box.
UAE Property AI Bot delivers project analysis, rental yields, and developer track records in seconds. Speed without verified sourcing is noise. This article documents exactly how that analysis is produced — from the official government registry, through every technical step, to the AI model that structures the final output. Every number in the system has a traceable origin. This is what that chain looks like.
The Source: Dubai Land Department
Every property transaction figure and rental contract metric in UAE Property AI Bot traces back to one place: the Dubai Land Department, or DLD. The DLD is the government body responsible for registering real estate transactions, tenancy contracts, and all related property data in Dubai. Its records are the official legal basis for sale prices, rent levels, and transaction volumes — what professionals, regulators, and courts rely on.
The system does not use portal asking prices, broker-reported figures, or third-party listing data. The entire pipeline is built on DLD-sourced datasets: registered transaction records and Ejari registered rent contracts. That keeps every number in the analysis aligned with what the regulator publishes rather than what any market participant chooses to advertise.
This distinction matters because portal asking prices in Dubai typically run 8–13% above DLD registered closing prices, and advertised rents typically run 8–12% above Ejari registered contract rents. Analysis built on portal data produces yield and price figures that are systematically optimistic. Analysis built on DLD data produces figures that reflect what buyers actually paid and what tenants actually registered.
From DLD to Cloud Storage
DLD data enters the infrastructure as structured files — typically CSV exports covering transaction records and rent contract records. These files are stored in a dedicated bucket in Google Cloud Storage.
The two primary datasets are:
Transactions — sale and purchase records including prices, dates, project identifiers, area designations, unit types, and related fields.
Rent contracts — Ejari-registered tenancy data including rent amounts, contract dates, property references, and contract type (new versus renewal).
The same files stored in Cloud Storage are the exact files that feed BigQuery. There are no hidden intermediate sources and no manual edits to the numbers between the DLD export and the database. The upload and refresh frequency tracks DLD export availability.
From Cloud Storage to BigQuery
The CSV files in Cloud Storage are loaded into Google BigQuery using Python scripts. Each script reads a specific dataset — transactions or rent contracts — and loads it into a dedicated BigQuery table using a truncate-and-replace approach. Each run replaces the table with the contents of the latest file. Schemas are applied consistently so that price fields, date fields, and area identifiers are typed correctly and ready for structured querying.
The result is a single source of truth inside BigQuery that maps directly to DLD-sourced data. No numbers are changed or reinterpreted during the load step. It is a direct transfer from government export file to queryable table.
How the Bot Queries the Data
When a search is run — for a project, a master community, or a developer — the system executes structured SQL queries against the BigQuery tables. These queries are defined in the codebase and return specific outputs based on the search type.
At the project level, queries return transaction counts, price per sqft (median and trend), completion status, registered unit counts, and — where sufficient Ejari data exists — rental contract counts, current rent levels, and rent movement over time. Developer and master community summaries are aggregated from the same underlying transaction and rental tables.
The query results are assembled into a structured data payload: a clean, typed summary of what BigQuery returned for that specific project or community. That payload is what gets sent to the AI model. The model does not receive raw CSV files, unstructured text, or any data source other than the specific BigQuery output from the DLD-sourced tables.
How Claude Sonnet Turns Data Into Analysis
Claude Sonnet, accessed via the Anthropic API, receives the structured BigQuery payload and produces the analysis sections visible in the bot output and Pro reports.
The model receives two inputs: the structured data payload from the query, and a fixed system prompt. The system prompt describes relevant Dubai market context — regulatory frameworks, supply dynamics, typical yield ranges by community type, known risk factors — and sets explicit rules for how the model must behave when writing analysis.
Those rules are direct: base all quantitative claims only on the data in the payload, do not invent figures, read trend direction from actual data fields rather than inference, and flag data gaps explicitly rather than substituting generic market commentary. If the transaction volume for a project is insufficient to support a yield conclusion, the output must say so.
Claude Sonnet is an interpreter of the data the system queries — not an independent source of market information. It has no internet access during a request, no direct access to DLD or any external database, and no ability to introduce outside data. It works only with the structured payload it receives. That constraint is what makes the analysis auditable: the model cannot produce a number that does not originate in the DLD-sourced BigQuery tables.
The Google Search enrichment in Pro reports operates separately — surfacing developer news, project announcements, and operational cost context — and is clearly marked as supplementary to the core DLD analysis rather than part of the transaction data chain.
What This Means for the Analysis You See
The practical consequence of this architecture is that the figures in any bot analysis have a specific, traceable origin.
When a Pro report states a median transaction price of AED 1,340/sqft for a specific project, that figure comes from DLD registered closing prices for that project in BigQuery — not from a portal listing, a broker brochure, or a developer marketing sheet. When it shows a gross yield calculation, the rent figure comes from Ejari registered contracts for that building and the price figure comes from DLD registered transactions for that building.
The DLD data is not perfect. Like any government registry, it has lags between transaction and registration, gaps in project coverage for very new launches with no registered completions, and occasional data quality issues in historical records. Where data is insufficient for a reliable conclusion, the analysis output flags that gap rather than filling it with assumption. A project with fewer than 10 registered transactions in the past 12 months will show a confidence qualifier on its yield and price figures.
The chain is: DLD → Google Cloud Storage → BigQuery → structured query output → Claude Sonnet → analysis. Every link is documented. Every number has a path back to a government-registered source.
Running Your Own Analysis
The transparency described here is the foundation for every search in UAE Property AI Bot. To apply it to a specific project or community you are evaluating:
Open the Web App via /project_search to analyse any of 700+ DLD-registered projects — the output surfaces transaction volume, price per sqft trend, Ejari rental density, and a full AI analysis including red flags, green flags, and Buy/Pass verdict, all drawn from the data chain described above.
Use /master_search for master community-level analysis — comparing sub-communities within JVC, JLT, Arabian Ranches, or Dubai South against each other on DLD transaction data.
Use /dev_search for developer track record analysis — delivery history, price performance across portfolio, and volume of registered completions drawn from DLD records.
Start with /top_apartments or /top_villas free — the daily ranking of the top 10 apartments and villas by total return from DLD data, with AI summary, at no cost.
FAQ
Where does UAE Property AI Bot get its data? All transaction price and rental yield data traces back to Dubai Land Department registered records. DLD transaction data and Ejari registered rent contract data are loaded from government-sourced CSV exports into Google BigQuery. The bot queries BigQuery directly. No portal asking prices, broker figures, or third-party listing data are used in the core metrics.
What is the difference between DLD data and portal data? Portal platforms publish asking prices and asking rents — what sellers and landlords request, not what buyers pay and tenants register. DLD registered transaction prices are the actual closing prices recorded at government registration. Ejari registered rents are the actual contract amounts registered with the government. Portal asking prices typically run 8–13% above DLD closing prices. Ejari rents typically run 8–12% below portal asking rents. Yield calculations built on portal data are systematically overstated relative to DLD and Ejari-based calculations.
Does Claude Sonnet invent or estimate any figures in the analysis? No. Claude Sonnet receives only the structured data payload returned by the BigQuery queries — which contains only DLD-sourced figures — plus a fixed system prompt. The model's instructions explicitly prohibit inventing figures, inferring numbers not present in the data, or substituting generic market commentary for data gaps. Where data is insufficient for a reliable conclusion, the output is required to flag that gap. The model has no internet access during a request and cannot introduce outside data.
How current is the data? The data is as current as the most recent DLD export loaded into BigQuery. Refresh frequency tracks DLD export availability. There is typically a lag between a transaction occurring and its registration appearing in DLD records — standard for any government registry. Very recent transactions and newly launched projects with no registered completions will not appear in the dataset or will appear with insufficient volume for reliable analysis, which the output will indicate.
Can I see the analysis for a specific project before subscribing to Pro?
Yes. /top_apartments and /top_villas are always free — these rank the top 10 apartments and villas by total return from DLD data with an AI summary. The Web App via /project_search, /master_search, and /dev_search allows 3 searches per day on the free tier. Pro (1,600 Telegram Stars/month, approximately 100 AED) unlocks unlimited searches and the full 10-section analysis including red flags, green flags, Buy/Pass verdict, service charge data, alternatives, and PDF forensic report.
Not investment advice. All analysis based on DLD registered transaction data and Ejari registered rental contracts.