Fun With Linq!


NOTE: this article was written against a beta version of the product or technology discussed in this article. There is absolutely no guarantee that the solution or code demonstrated here will work on the final version of the product.

Quite often I have discussed the impossibility of joining data from a database with that of the file system. I discussed it with my team members, mentioned it in trainings and wrote about it recently in this article.

Being able to join data from a database with other data from a file system could be very useful though.

Introduction

Imagine a web based content management system that allows a user to enter articles. Together with the article, she can upload one more images that are presented with the image. So far so good. However, when the article is deleted from the database, the images should also have been deleted. Unfortunately, quite often a developer forgets to add this functionality to the system.

I have seen it many times before in old web applications that we're maintaining. We have huge folders with images, probably half of which are orphaned. I often joked that it would be nice to do something like this:

DELETE FROM C:\Uploads\Images WHERE File.Name NOT IN (SELECT FileName FROM Attachments) 

where C:\Uploads\Images is the folder containing the uploaded images and Attachments is a database table with a column called FileName that holds the filename of an image, like NewsArticle123.jpg.

Obviously, this doesn't work as the file system doesn't understand database parlance, while at the same time the database doesn't understand what a file is, or how the file system works.

For some time, I was hoping that WinFS would enable this kind of querying, but I guess we'll have to wait a bit more to find out.

But as soon as I saw Linq and some of its database and collection capabilities, I figured it should be possible to get real close to my final syntax. With the help of a single Linq to SQL file, some drag and drop operations and a number of lines of code, I was able to accomplish what I wanted: an easy way to delete orphaned images. Here's what I did:

  1. Added a Linq to SQL file to my .NET 3.5 Windows Application.

  2. From the Server Explorer, dragged the Attachments table onto the Object Relational Designer.

  3. Called the following code from a Button's Click event:
public void DeleteOrphans()
{
  string imagePath = @"C:\Uploads\Images";
  FileInfo[] myFiles = new DirectoryInfo(imagePath).GetFiles();

  MyAttachmentsDataContext db = new MyAttachmentsDataContext();

  var orphanedFiles =
      from o in myFiles
      join a in db.Attachments on o.Name equals a.FileName into theFiles
      where !db.Attachments.Any(p => p.FileName == o.Name)
      select o;

  foreach (var orphanedFile in orphanedFiles)
  {
    orphanedFile.Delete();
  }
}

The first two lines of the code get an array of FileInfo objects with all the files in the C:\Uploads\Images folder. Next, these files are joined with the result of the Attachments table with a where clause that filters out all files that have a matching Attachment record in the database. Once the query is fired in the foreach block, the resulting orphanedFiles is a collection of FileInfo instances that don't have an associated record in the Attachments table and can thus be deleted.

Notice how the type inference system did a good job in determining the type of the items in the collection. Although I didn't explicitly determine a type for the orphanedFiles collection, the compiler knows that it will contain instances of FileInfo, so I can call the Delete method without an additional cast.

This is my first stab at solving this problem with Linq, so there may be better or easier ways to do it and get even closer to my initial code example. I also think it won't scale very well to folders with thousands of files. However, it's a nice example and was fun to figure out. It also shows the great potential of the upcoming Linq technologies.


Where to Next?

Wonder where to go next? You can read existing comments below or you can post a comment yourself on this article .


Consider making a donation
Please consider making a donation using PayPal. Your donation helps me to pay the bills so I can keep running Imar.Spaanjaars.Com, providing fresh content as often as possible.



Feedback by Other Visitors of Imar.Spaanjaars.Com

On Sunday, April 08, 2007 8:47:40 PM yousaid said:
Greetings,
I have been watching and noticing that you and increasingly many other developers are switching from VB to C# and it's variants.
Do you think there's that much future left for VB?
cheers,
yousaid
On Sunday, April 08, 2007 9:16:15 PM Imar Spaanjaars said:
Hi yousaid,

I am not sure I agree with your observations. My last book was entirely in VB and I also quite often post articles in VB and C# at the same time.

The example above is in C#, but that's purely coincidence. The post is about how cool Linq is; not about the actual syntax used to accomplish the deletes.

Considering the amount of efforts Microsoft is investing in the next version of VB (including many language enhancements and Linq), I'd say VB is far from being dead.

Imar
On Sunday, April 08, 2007 9:56:08 PM yousaid said:
Greetings,
Ok, that's reassuring, I am now teaching myself C#, just incase.
Off topic:
I have bought many books you have been involved in from your DreamWeaver days to now. MM or is it Adobe sent me email the other day about a new version of DW, but honestly, I barely use it these days and have uninstalled it from my development boxes. I personally think, their refusal to support ASP.NET 2.0 is baffling. What do you think?
cheers,
yousaid
On Sunday, April 08, 2007 10:18:45 PM Imar Spaanjaars said:
Hi ,

Yeah, it's always useful to understand both (or more) languages.

Re: Dreamweaver: I think I have to agree with you. You can read my full take on it here: http://www.spaanjaars.com/QuickDocId.aspx?quickdoc=389

Re my books: glad you like them; hopefully you continue to read them in the future....

Cheers,

Imar
On Thursday, March 27, 2008 2:32:48 AM confused_with_linq said:
Can you explain what the where clause is doing?  What is p, just an alias for the resultset? If so it reads where there is a result or the result has a filename or the file has a filename.

where !db.Attachments.Any(p => p.FileName == o.Name)
On Sunday, March 30, 2008 7:04:24 PM Imar Spaanjaars said:
Hi there,

The p is called a range variable and is used to refer to the result object from within the query.

The where clause basically says: "give me all the attachments that don't have a corresponding record in db.Attachments". (Note the ! in front of the where clause to reverse the Any selection.)

You may want to read some getting started stuff for LINQ: http://www.google.com/search?hl=en&q=getting+started+LINQ

Cheers,

Imar

Talk Back! Comment on Imar.Spaanjaars.Com

I am interested in what you have to say about this article. Feel free to post any comments, remarks or questions you may have about this article. The Talk Back feature is not meant for technical questions that are not directly related to this article. So, a post like "Hey, can you tell me how I can upload files to a MySQL database in PHP?" is likely to be removed. Also spam and unrealistic job offers will be deleted immediately.

When you post a comment, you have to provide your name and the comment. Your e-mail address is optional and you only need to provide it if you want me to contact you. It will not be displayed along with your comment. I got sick and tired of the comment spam I was receiving, so I have protected this page with a simple calculation exercise. This means that if you want to leave a comment, you'll need to complete the calculation before you hit the Post Comment button.

If you want to object to a comment made by another visitor, be sure to contact me and I'll look into it ASAP. Don't forget to mention the page link, or the QuickDocId of the document.

For more information about the Talk Back feature, check out this news item.