Software and me: Entity Framework

Showing posts with label Entity Framework. Show all posts

Thursday, 23 February 2012

Primary Keys in Entity Framework and Sync Framework - Part 2

In Part 1 we considered various options when it came to using Microsoft Sync Framework with Entity Framework.

After a certain amount of pain we seem to have got to something approaching a settled condition.

We didn't want IDs that were GUIDs as this would make them unreadable. So, we are keeping Primary keys as simple ID columns.
Because of this, we went for a solution that would reseed the local tables on each client to a known range. After each synchronisation, the tables would be reseeded to the correct value. That way entries created on the client/server would have different IDs and wouldn't clash
We had to use SQL Server Express on the clients.

Number (3) above is a long story...

SQL Compact 4 won't play nicely with Entity Framework or the Sync Framework (Link here and here )
SQL Compact 3.5 wouldn't play nicely with the various items - follow this link and see the section "Limitations of SQL Server compact" - basically it can't:

"support entities with server-generated keys or values when it is used with the Entity Framework ".

There's a lot of other things that SQL server compact can't do - including supporting stored procedures. So that's how we ended up back at SQL Server Express.

I hope this gives you a flavour of the sort of things to consider when looking at data replication on SQL Server.

You're going to ask why didn't we use SQL Server Merge Replication - that's another story and another post...

Thursday, 29 December 2011

Primary Keys in Entity Framework and Sync Framework

Hold on - we're going for a walk on the wild side! It will take a few minutes to read - but it's a thriller...

This is a summary of what I've discovered after a bit of research - hopefully it will help other developers/architects out.

As part of a project I'm looking at creating a SQL server replicated architecture... and need to avoid clashes in the PK. The icing on the cake is that we're using Entity Framework heavily in the project for data access.

Exhibit 1 ... http://msdn.microsoft.com/en-us/library/bb726011.aspx

This article gives a few options about choosing a suitable Primary key in replication scenarios:
1. Using a GUID
2. Pk that includes a node ID
3. Natural keys

OK.... starting from the bottom up...

Natural Keys won't work for our project. Full stop
A PK that includes a Node ID - means adding a column and giving each client a Node ID. Not the end of the world, but... doesn't feel right...
GUID... sounds good, or is it?

Enter Exhibit 2 - http://leedumond.com/blog/using-a-guid-as-an-entitykey-in-entity-framework-4/
Yes, it's all true... I tried it and had to hack the EF model XML to include the correct attribute to get it to work. And, Yes, doing an update loses the change.
Yes, it's a feature and yes, it's not pleasant - it would mean after every update hacking the XML to find all the relevant GUID PKs and adding in the attribute. I'm sure we could automate it by some clever searching and replacing, but it's not right.

Also, Enter Exhibits 3, 4, 5 and 6
http://www.eggheadcafe.com/tutorials/sql-server/d662b371-ed27-481c-aee0-ebe7cf2d9fad/why-guids-are-not-a-good-idea-for-sql-server-primary-keys.aspx
http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html
http://www.simple-talk.com/sql/database-administration/the-identity-crisis-in-replication/
http://sqlblogcasts.com/blogs/martinbell/archive/2009/05/25/GUID-Fragmentation-in-SQL-Server.aspx

From this set of links you start to see that GUIDs are good, but come with a number of caveats.
Also using newsequentialid() - doesn't necessarily generate sequential GUIDs - I tried this. If you do a number of inserts close together then you do get them, but not always.

So, you then can potentially have the performance issues.
Also, the space issue is relevant for us - since we're looking to replicate to SQL server compact.

Alternatives?

First off, well we could use a normal ID (int) column and change the seed value for each client.
E.g. The Server has a seed of 10,000000
Client 1 has a seed of 20, 000000,
Client 2 has a seed of 30, 000000 etc

So, each client would be able to make 9,999,999 inserts before a conflict and we'd have a limit of 214(X2) users because we could have negative numbers as well.

Lets just do some maths....

The max value of an int is +/-2,147,483647.

If you're really worried about conflicts you could go for using a bigint as the ID column, then you could have something like 9223327036850 clients! The seeds could then be something like 100,000000, 200000000, 300000000, etc so 99,999,999 inserts before a conflict - which should be enough for a lifetime - e.g. one insert a second would take you 1157 days. 100 inserts a day would take over 2500 years.

If you were using the the smaller int then 100 inserts a day would last over 250 years!

That means changing the seed on each client appropriately.
Now, there are ways of doing this:
http://www.techrepublic.com/blog/datacenter/how-do-i-reseed-a-sql-server-identity-column/406 but it seems there a some gotchas that you have to be aware of:
http://geekswithblogs.net/argot/archive/2009/10/18/sync-framework-common-practise-of-ado.net.aspx

Basically, you have to store and remember seed values before/after synchronisation.
You'd keep a list of seed values for each client. So any time you get a new client the first sync would also download its seed value.

This solution has a certain appeal, because:
a. it leaves the PK alone as a single ID column
b. The replication mechanism takes care of the seeding/reseeding.
c. Entity Framework designer won't throw any wobblies.

Next alternative...

Go back to a compound PK, ID plus a Node ID.
a. Entity Framework should cope with this.
b. Replication mechanism should be a bit simpler.
But... it means infecting the database with the replication details i.e. the Node ID and artificially creating a more complex PK. So, you can't just go select * from Users where ID = 10.

Another thought....

This is horrible I think, but has a certain "something" about it!

Leave the PKs alone completely - simple ID column with an int.
Then code for the PK conflicts in the sync framework - as it will detect PK conflicts.

So, we write logic that will do something like:
- determine of the inserted record exists on the server

- if not, then add it (generating a new record/PK) then remove it from the client and

- resync the client to pull down the "new" record but with the updated PK.

You'd have to write this sort of thing to cover various scenarios and it may/would be table-specific - but what if we replicating a lot of tables?

Summing up...

We're looking at either the compound PK or the different seed values.

We're going to try some stuff out - probably the different seed values first as that leaves the EF model alone completely - and I'll try and report back how it went.

Thursday, 8 September 2011

Entity Framework Tip #1 - Using MergeOption to get latest entities

Hopefully this is the first of various Entity Framework tips that I've gleaned along the way.
They'll serve as reminders for me and if they can help other people, they will have served their purpose.

Ok, what's wrong with this?

(We're using STEs because we're throwing things across the network).
I was doing some testing and had code that looked like this.

 DBEntities context1 = new DBEntities();
 
 DBEntities context2 = new DBEntities();
 Site newSite = new Site();
 newSite.SiteName = "QAZ";
 
 context1.Sites.AddObject(newSite);
 context1.ApplyChanges("Sites", newSite);
 context1.SaveChanges();
 
 Site site2 = context2.Sites.Where(s => s.ID == newSite.ID).FirstOrDefault();
 
 // update the site...
 site2.ChangeTracker.ChangeTrackingEnabled = true; // Because they're STEs
 site2.SiteName = "QAZ QAZ";
 context2.ApplyChanges("Sites", site2);
 context2.SaveChanges();
 
 
 // I thought I'd use the first context to query for the updated site...
 
 var query = context1.Sites.Where(s => s.ID == newSite.ID);

 Site site3 = query.FirstOrDefault();

What's the SiteName of site3?

No, it's not "QAZ QAZ" - it's "QAZ"

Why is that?

Because the site already exists in context1 - so even though a database query is actually executed, EF will not overwrite what's in context1 because its cache already contains an entity with that EntityKey.

How to make sure that I actually get the latest values back?

Set the MergeOption on the objectSet before doing the call - as follows...

context1.Sites.MergeOption = MergeOption.OverwriteChanges;

The default option is AppendOnly which won't update the context if it finds an entity with the

same entityKey present.

Another tip from the twighlight world of entity framework...

One thing I've yet to investigate is how delayed execution would affect MergOption settings.
So, if I reset the MergeOption after the query to AppendOnly and then do further queries before finally executing the query - would the different MergeOptions be applied at the right time... hmmm...

CRM Development