Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

Export to OneLake (Fabric) #120

Open
RonKoppelaar opened this issue Jun 23, 2023 · 14 comments
Open

Export to OneLake (Fabric) #120

RonKoppelaar opened this issue Jun 23, 2023 · 14 comments

Comments

@RonKoppelaar
Copy link
Contributor

Currently MS is making a lot of noice regarding Fabric. A complete SAAS solution for all BI related stuff. Fundament of the Fabric solution is the OneLake. I guess technically a DataLake (vNext) storage account.
Having the BC extension able to push data directly in OnleLake would be a great improvement to the extension.

Within our organisation the BI team is investigating Fabric in more detail. I guess they will use custom code now to get BC data into the OneLake. Or use a dedicated DataLake to store the data first.

@DuttaSoumya
Copy link
Contributor

Thanks @RonKoppelaar for that question. Generally speaking, one can easily make use of the OneLake shortcuts feature to see the BC data in Fabric. Albeit, this would still require to use the current method of maintaining your custom data lake storage account. We are indeed investigating how to best pour the BC data directly into the OneLake of the Fabric, so data administrators only have to maintain the data in one place.

Having said that, I would enquire what other features of the Microsoft Fabric actually appeals to users in the SMB segment that BC plays a role in. I would like to open this forum to everyone on our small community to gives us feedback on this topic.

Best regards,
@DuttaSoumya.

@RonKoppelaar
Copy link
Contributor Author

If understand you right:

  • We export to DataLake as we do today (With bc2adls)
  • With shortcuts we can link the Datalake storage account to Onelake in Fabric?

@DuttaSoumya
Copy link
Contributor

Yes @RonKoppelaar- to be precise, the data lake that you are currently exporting your BC data to, can be visualized in Fabric by creating an ADLS shortcut.

@njackson1582
Copy link

This would be a huge feature for us and would create an almost guaranteed sale of Microsoft Fabric for any and all clients that want reporting from BC, especially clients with large datasets.

The big appeal for Fabric in the SMB market is (my personal opinion):

  1. SaaS Pricing. SMBs are much more sensitive to variable pricing in our experience.
  2. Lower Pricing. Dedicated Fabric tiers start at around $250/month with incredible performance. This is great all-inclusive pricing.
  3. One-Stop Shop. Fabric is an all-in-one product, allowing semi-technical resources to do data manipulation in a much more user-friendly environment compared to Synapse.
  4. Data-First Environment. Fabric enables SMBs to have a data-first approach to decision making better than the individual pieces have in the past (especially OneLake).
  5. Affordable Code-First Approach. Our team has a strong background in code and we often want to use Apache Spark to perform common ETL actions. However, the cheapest Apache Spark Pool in Synapse is hundreds if not thousands of dollars a month, far too much for most of our clients. But with Fabric, we can run Spark with the SaaS pricing model.

@trimline-gaiustemple
Copy link

trimline-gaiustemple commented Jun 28, 2023

@DuttaSoumya When creating this shortcut, am I right that you first have to create the shortcut under the "Files" section (not Tables), then Load to Table?
If so, I am encountering errors. For example:
Invalid column name(s) 'timestamp-0, systemId-2000000000, SystemCreatedAt-2000000001, $Company, Code-1, Description-5, Receive-10, Ship-11, PutAway-12, Pick-13, SystemCreatedBy-2000000002, SystemModifiedAt-2000000003, SystemModifiedBy-2000000004'. Column names must contain UTF-8 encoded Unicode word characters and can be a maximum of 128 characters long. Unicode word characters include letters of any case and their nonspacing marks, punctuation connectors like the underscore(_), and decimal digits.

Edit: could this be changed to an underscore? Also suspect $Company may cause issues here too?
ConcatNameIdTok: Label '%1-%2', Comment = '%1: Name, %2: ID';
CompanyFieldName: Label '$Company';

@DuttaSoumya
Copy link
Contributor

DuttaSoumya commented Jul 5, 2023

Thanks for all the great feedback on this matter! I concur with the other comments in this thread about deriving the true benefits of Fabric when BC data is first placed directly in the Fabric. Meanwhile, I would like to know your opinion of the choice between the lakehouse and the warehouse- both being supported by Fabric.

To answer your question @trimline-gaiustemple, my suggestion on the OneLake shortcuts was merely to visualize the data in Fabric, while the main data still resides in a lake outside the Fabric, and yes in the Files section like so,
image

If the purpose is to load the data into OneLake tables instead, one may simply run a notebook like the one below, to get a table view like so, (note the column names being unchanged)
image

# Parameters
shortcut_name = 'bc'
entity_name = 'CustLedgerEntry-21'

import re
df = spark.read.parquet(f'Files/{shortcut_name}/data/{entity_name}')
table_name = re.sub('[^a-zA-Z0-9_]', '_', entity_name) # remove invalid characters and replace them with underscores
df.write.mode("overwrite").format("delta").saveAsTable(table_name)

@Bertverbeek4PS
Copy link
Contributor

@DuttaSoumya I would say lakehouse. Because it is unsctructuered based on files.
Then the customer can put it in a datawarehouse for a structured data concept and do it reporting on that.

@njackson1582
Copy link

@DuttaSoumya I agree with @Bertverbeek4PS in that a lakehouse would be a better fit since the data will come in with an unstructured (though highly organized) fashion.

@trimline-gaiustemple
Copy link

@DuttaSoumya I also agree with @Bertverbeek4PS and @njackson1582 that Lakehouse is the better fit.

@trimline-gaiustemple
Copy link

Just out of interest, is this something that is planned and if so is there a rough timescale?

@trimline-gaiustemple
Copy link

Hey @JavierSassen,
Thanks for the link. I had noticed that article a week or two back, however for large tables OData isn't ideal as we hit API limitations. Also it means we have to create custom API pages for fields that aren't in the default API pages. This extension fixes that by allowing us to tick relevant fields, and also uses the last modified time to sync only modified data. IMO it makes a lot of sense for this extension to either natively support a "shortcut" under the "table" section of a lakehouse so that the advertised performance of Fabric can be realised without additional hassle, or a better solution for getting the data into Fabric.

@njackson1582
Copy link

@trimline-gaiustemple
That is the exact issue we ran into with OData. Report refreshes were timing out due to large data sizes. This (amongst other reasons) is why I would love a direct integration into Fabric.

However, one thing that may be in-progress behind the scenes at Microsoft on the BC team is a delta export. This would be huge, because it would let us use OData for massive datasets as long as the created/updated content is minimal. Kennie at MS teased this idea a bit: https://twitter.com/KennieNP/status/1677015185549144065

@Bertverbeek4PS
Copy link
Contributor

I have created a branche that enables exports to a lakehouse in MS Fabric.
With a notebook it will push the data into a table.
You can try and play with it:
https://github.com/Bertverbeek4PS/bc2adls/tree/Microsoft-Fabric-Integration

Blog:
https://www.bertverbeek.nl/blog/2023/08/15/bc2adls-ms-fabric-enabled-part-1/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants