Another Stranger Me

my comments, my projects, my resources…

Alfresco/Lucene killer sorts

Our customers almost always have a business requirement that is both reasonable and a real performance killer. That is sorting search results by the last modification time, or cm:modified in Alfresco model.

Lets first explain how sorting in lucene actually works. Let us imagine that we have 10 million documents in our index. Of these, 100,000 were updated within the last day. Now, searching for a subset of documents with last modification date 1 day in past would work quite fast. But when you add sort by date into context, things get dead slow.

This is because lucene will first load all the different modification timestamps for the subset that matches the query, potentially all 100,000 of them if you were unfortunate to allow the ‘*’ wildcard, sort them and then complete the search for a subset of documents on that ordered list. And sorting a huge number of date strings sucks even with modern CPUs. When you add up couple of requests from multiple users at the same time, you can end up with a dead slow application until the sorting completes.

Colleagues even did a simple test search using noderef. This search always returns a single result. When they add sorting by any property, like title, getting result takes double the time.

There are several approaches to this problem:
1) Always try to avoid sorts by property that potentially has a huge number of different values. Completely avoid sorts of any kind.

2) With Alfresco we’ve found out that whenever we change the document, a new row is inserted into the database. We figured out that we have the node-dbid property within the lucene index which is an integer. So we do the sorting on node-dbid field. While the results may not be 100% as accurate as with the last modification datetime in all cases, we found the resultset the same in all our test and performance was much better as we are sorting integers now.

3) Implement your own metadata property which would be a timestamp and keep it updated. Use it for sorts.

Alfresco – Lucene index gone bad and storage considerations

Background

At IT Sistemi we have a couple of clients that have really big Alfresco repositories. We started working with Alfresco years ago, I believe our first version was 2.1. Due to keeping costs low for our clients we have always used the Community edition. Over the years we have become true experts in Alfresco internals, mainly due to the fact that we never had the access to the support that comes with the Enteprise edition.

Basically we had to learn all the pits of scaling Alfresco ourselves, often the hard way.

One of our clients uses Alfresco from the version 2.1 (upgraded to 3.0 Labs C, and then to 3.4.c) and over the years their repository became huge. To give you the sense of scale, they have about 2,5 M documents in their repository, alf_node_properties table in database has over 120M rows (MySQL database server). Customer indexes thousands of scanned documents into Alfresco each day to make a total of about 600 GB of documents ATM. And of course, everything needs to run smoothly. Entire setup runs on just two VMware machines (Alfresco server + MySQL server).

Performance issues

Over the years, as repository grew, also grew the number of performance issues. Since Alfresco was for a long time a black box for us at first we tackled these performance issues mainly by raising the JVM heap size which was fairly easy thing for us to do since we were on VMware. Also, most of the “server is stuck” issues could be solved with the simple service restart. But, about a year ago, running Alfresco 3.0 we started to receive a lot of complaints about the system speed.

Now, when this started happening our Alfresco server was running on 4 CPUs, RAID 10 for lucene, RAID 5 for data, 12 GB RAM. MySQL was also on RAID 10 disks. Looking at hardware, system should preform well. But problems continued. We were receiving either “search is slow” or “indexing is slow” tickets every couple of days, in ratio 5:1 for search.

We simply could not find out what is the problem, we would simply see a 100% CPU spike out of nowhere, with normal disk behavior and no bottlenecks than CPU. Now, problem with CPU is that you can’t just throw it more in like memory or disks as if you do that on VMware and you still end up using 100% CPU then you might spread the problem on other VMs. Problem had to be solved on the software side. But, where does one start after optimizing database, rebuilding lucene index, defragmenting all disks, optimizing JVM settings, and clearing all the junk from the repository?

Shit hits the fan

Things started to look really nasty. Users would click save after editing document properties and it would take 5-10min to complete the action. But finally we saw the other symptom other than the CPU – average disk queue on lucene disks was over 120 (everything above 2 indicates that disks can’t keep up). But we are speaking about RAID 10 here. How can this be? Is something else on VMware that runs on same disks killing it?

To isolate possible VMware problems with disk we came up with another nasty strategy to give us some breathing space. We ended up halving the JVM heap size and using a great little program Imdisk to create a RAM disk and we put the lucene index on it. This was safe to do as Alfresco backs up lucene index every night so in case of the unexpected shutdown we would always be able to quickly bring back the index without the need for full index rebuild. Nasty, ugly, silly but it worked… Unfortunately, only for a couple of days.

Users started to complain again. We couldn’t believe it – even the RAM disk couldn’t keep up with the IO demand of the lucene index.

Light at the end of the tunnel

While struggling with the production environment we were also looking at our options. Searching high and low on Google we ended up on this article: http://onjava.com/onjava/2003/03/05/lucene.html. So we tested on the customers servers using IndexTuningDemo class and using different MergeFactor values. We could see some performance improvements but again it turned out to be a dead end. All the cool properties to change to optimize the indexing were not available on this Alfresco version. And then it dawned on us and we used the IndexTuningDemo class with lucene JAR-s from Alfresco version 3.4.c. Results were amazing:

C:\temp\itsistemi – lucene – 3.4.c>”C:\Program Files\Java\jdk1.6.0_11\bin\java” -Xmx1536M -Xms1024M IndexTuningDemo 10000 10 100000
C:\DOCUME~1\dms\LOCALS~1\Temp\4\\index
Total time: 1156 ms

C:\temp\itsistemi – lucene>”C:\Program Files\Java\jdk1.6.0_11\bin\java” -Xmx1536M -Xms1024M IndexTuningDemo 10000 10 100000
C:\DOCUME~1\dms\LOCALS~1\Temp\4\\index
Total time: 77424 ms

Yup, that’s right. Approximately 6.700% performance boost in indexing speed. Upgrade was the path to go no matter how much we dislike upgrading repository with 2.4M documents inside.

Getting mad

So we’ve upgraded the repository, got rid of the RAM disk, did another full index rebuild just in case, and life was good again…NOT. Only one week passed and the enemy was at the gates again. At this time I was cursing Alfresco, Lucene and everyone that built them. CPU again at 100%, now with some ugly blocking, disks are again slow.

So I fired up the HotThreads to see what’s going on again and guess what – lucene index:

uto 02.08.2011
09:06

106.3% CPU Usage by Thread ‘indexThread7’
4/10 snapshots sharing following 19 elements
java.io.RandomAccessFile.writeBytes(Native Method)
java.io.RandomAccessFile.write(RandomAccessFile.java:482)
org.apache.lucene.store.FSIndexOutput.flushBuffer(FSDirectory.java:588)
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
org.apache.lucene.store.BufferedIndexOutput.writeByte(BufferedIndexOutput.java:35)
org.apache.lucene.store.IndexOutput.writeChars(IndexOutput.java:106)
org.apache.lucene.index.TermInfosWriter.writeTerm(TermInfosWriter.java:133)
org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:108)
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:322)
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:289)
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:253)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:1247)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.mergeIndexes(IndexInfo.java:3460)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.runImpl(IndexInfo.java:2875)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$AbstractSchedulable.run(IndexInfo.java:2711)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
3/10 snapshots sharing following 21 elements
java.io.RandomAccessFile.readBytes(Native Method)
java.io.RandomAccessFile.read(RandomAccessFile.java:338)
org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:537)
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
org.apache.lucene.index.TermBuffer.read(TermBuffer.java:67)
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:118)
org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:65)
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:293)
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:253)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:1247)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.mergeIndexes(IndexInfo.java:3460)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.runImpl(IndexInfo.java:2875)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$AbstractSchedulable.run(IndexInfo.java:2711)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
2/10 snapshots sharing following 10 elements
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:289)
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:253)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:1247)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.mergeIndexes(IndexInfo.java:3460)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.runImpl(IndexInfo.java:2875)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$AbstractSchedulable.run(IndexInfo.java:2711)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
Unique snapshot
java.io.RandomAccessFile.readBytes(Native Method)
java.io.RandomAccessFile.read(RandomAccessFile.java:338)
org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:537)
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
org.apache.lucene.store.IndexInput.readVLong(IndexInput.java:77)
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:121)
org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:65)
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:293)
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:253)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:1247)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.mergeIndexes(IndexInfo.java:3460)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.runImpl(IndexInfo.java:2875)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$AbstractSchedulable.run(IndexInfo.java:2711)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

20.3% CPU Usage by Thread ‘RMI TCP Connection(1287)-10.211.0.116’
10/10 snapshots sharing following 32 elements
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:154)
sun.reflect.GeneratedMethodAccessor412.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167)
com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96)
com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33)
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
javax.management.StandardMBean.invoke(StandardMBean.java:391)
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
sun.reflect.GeneratedMethodAccessor599.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
sun.rmi.transport.Transport$1.run(Transport.java:159)
java.security.AccessController.doPrivileged(Native Method)
sun.rmi.transport.Transport.serviceCall(Transport.java:155)
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

1.6% CPU Usage by Thread ‘Store org.alfresco.cache.ticketsCache Spool Thread’
10/10 snapshots sharing following 4 elements
java.lang.Thread.sleep(Native Method)
net.sf.ehcache.store.DiskStore.spoolAndExpiryThreadMain(DiskStore.java:588)
net.sf.ehcache.store.DiskStore.access$800(DiskStore.java:64)
net.sf.ehcache.store.DiskStore$SpoolAndExpiryThread.run(DiskStore.java:1074)
uto 02.08.2011
09:28

109.4% CPU Usage by Thread ‘indexThread2’
4/10 snapshots sharing following 14 elements
java.io.RandomAccessFile.readBytes(Native Method)
java.io.RandomAccessFile.read(RandomAccessFile.java:338)
org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:537)
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:211)
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:169)
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:153)
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:1280)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.mergeIndexes(IndexInfo.java:3460)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.runImpl(IndexInfo.java:2875)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$AbstractSchedulable.run(IndexInfo.java:2711)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
6/10 snapshots sharing following 15 elements
java.io.RandomAccessFile.writeBytes(Native Method)
java.io.RandomAccessFile.write(RandomAccessFile.java:482)
org.apache.lucene.store.FSIndexOutput.flushBuffer(FSDirectory.java:588)
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75)
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:212)
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:169)
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:153)
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:1280)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.mergeIndexes(IndexInfo.java:3460)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$Merger.runImpl(IndexInfo.java:2875)
org.alfresco.repo.search.impl.lucene.index.IndexInfo$AbstractSchedulable.run(IndexInfo.java:2711)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

104.7% CPU Usage by Thread ‘Sess_W23_LSN24’
8/10 snapshots sharing following 4 elements
org.alfresco.jlan.netbios.win32.Win32NetBIOS.Receive(Native Method)
org.alfresco.jlan.smb.server.win32.Win32NetBIOSPacketHandler.readPacket(Win32NetBIOSPacketHandler.java:133)
org.alfresco.jlan.smb.server.SMBSrvSession.run(SMBSrvSession.java:1232)
java.lang.Thread.run(Thread.java:662)
2/10 snapshots sharing following 1 elements
java.lang.Thread.run(Thread.java:662)

20.3% CPU Usage by Thread ‘RMI TCP Connection(1323)-10.211.0.116’
10/10 snapshots sharing following 32 elements
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:154)
sun.reflect.GeneratedMethodAccessor412.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167)
com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96)
com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33)
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
javax.management.StandardMBean.invoke(StandardMBean.java:391)
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
sun.reflect.GeneratedMethodAccessor612.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
sun.rmi.transport.Transport$1.run(Transport.java:159)
java.security.AccessController.doPrivileged(Native Method)
sun.rmi.transport.Transport.serviceCall(Transport.java:155)
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)


Solution

Well, at least now we had all those cute lucene configuration options at our disposal. But which ones should we use? Whatever we tried before made no real improvements. And then finally, in the obscure universe of Alfresco documentation I found a real gem: http://wiki.alfresco.com/wiki/Index_Merging_Performance:

It is usualy a good practice that the highest-numbered INDEX entries (which contains the least documents, number 4 in the examples above) do not contain more than a few hundred documents. It it’s not the case, it could lead to massive amount of IO pressure on the index directories for merging operations.

Few hundred documents? Our highest-numbered INDEX entry had 250.000 documents in it!

Default value for parameter lucene.indexer.mergerTargetIndexCount in Alfresco is 5. So we changed that to 8 (you don’t want to go very high with this number as it affects search speed). And voila, after changing this our highest-numbered INDEX entry dropped to about 2.000 and performance was finally normal.

Summary

To summarize some best practices:

  • Greatest impact on user-experience in Alfresco is achieved by optimizing search/indexing behavior
  • Searching lucene is FAST no matter how many document you search
  • Checking permissions on documents found using lucene is SLOW
  • Considering the above, make sure your database is working as fast as possible
  • Use RAID 10 for database files, and for lucene index
  • Use SSD if possible
  • Make sure your lucene index is configured optimally, specifically mergerMergeFactor and mergerTargetIndexCount parameters
  • Alfresco content store can be on slow disks and it generally won’t impact user-experience
  • CPU is rarely a real problem
  • JVM heap size is rarely a real problem
  • Content transformations (for instance document preview) can be a resource hog – offload if possible
  • HotThreads is your friend
  • org.alfresco.repo.search.impl.lucene.index.IndexInfo is your friend

Exchange 2010: enable anti-spam features on a hub transport server

By default hub transport servers have anti-spam features disabled. In certain situations, for example in small organizations where you don’t plan to use edge transport servers, you would want to enable this functionality. To do so, follow these steps:

  1. Open up Exchange Management Shell
  2. Position to %system drive%/Program Files\Microsoft\Exchange Server\V14\Scripts folder and run the following commands:
  3. ./install-AntispamAgents.ps1
    Restart-Service MSExchangeTransport

  4. As final step, to make all anti-spam features work correctly, Exchange needs to know about your SMTP servers so you need to run (šut your own IP addresses here, you can also write address ranges like 192.168.1.0/24):
  5. Set-TransportConfig -InternalSMTPServers 192.168.1.10,192.168.1.11

    Last step can also be done through Exchange Management Console. Anti-spam features should now be also visible in it. To further manage anti-spam features refer to Technet.

Announcement: IT Sistemi – SCOM Management Pack for Alfresco

IT Sistemi – SCOM Management Pack for Alfresco on Unix is designed to work with Alfresco Enterprise installations that are deployed on Unix-derived platforms which include popular Linux distributions like Red Hat Enterprise Linux, Novell SUSE, or CentOS. Management pack can be configured to monitor remote Alfresco deployments so this management pack can also be used to monitor Alfresco installations deployed on Microsoft Windows operating system.

Management pack is available for free to all interested customers for a limited time period. Beta version of this management pack expires on October 1st and it will stop functioning on that date. Please note that we left out some functionality that is not working at the moment or we otherwise found it not ready for the public distribution at the moment. To get all the management pack features you will need to monitor Alfresco Enterprise version due to Alfresco Community version limitations.

You can download the IT Sistemi – SCOM Management Pack for Alfresco on Unix by filling out the request form on the IT Sistemi website. Installation instructions are included with the download. Continue reading

Alfresco 3.3g installation on CentOS 5.5 64-bit Linux server

The following guide will show you how to install a CentOS 5.5 64-bit Linux server based Alfresco ECM server. CentOS Linux distribution is amongst the most popular ones for its well known binary compatibility with major commercial Linux distribution Red Hat Enterprise Linux. CentOS is pretty much same as RHEL only without Red Hat’s vendor branding and artwork. Major reasons why you want to use CentOS over some other free Linux distributions is its stability and what’s most important hardware compatibility. Big hardware vendors like HP, Dell, or IBM tend to release Linux drivers for specific distributions only so choosing a distribution that is fully compatible with the USA-based and leading Linux vendor Red Hat is likely a smart move.

Now, important point to say is that Alfresco loves 64-bits so you want to install it on 64-bit hardware, operating system and use 64-bit Java Virtual Machine. While it certainly can work on 32-bit systems for production use I recommend that you use 64-bit system. Lets begin with Alfresco prerequisites.
Continue reading

Alfresco 3.3g integration with Active Directory and Google Docs

My last article on Alfresco integration with Active Directory brought up a lot of interest and what’s the most important positive feedback. That article is based on Alfresco Community version 3.2r2 so if you are using that version please continue reading that article.

This article will cover the latest community release 3.3g. On one hand I’m going to leave out most of the explanations that you can find in the original article and I’ll focus on getting things done in a focused article. You can also refer to the chapter 6 of the Professional Alfresco: Practical Solutions for Enterprise Content Management (Wrox Programmer to Programmer) [amazon.com] book or to the chapter 4 of the Alfresco 3 Enterprise Content Management Implementation [amazon.com] book.

Still, plan is to provide newcomers clear guides what has to be changed or to be more precise what is domain specific so I’ll make sure to insert “(domain specific property)” in code comments. Also, since version 3.3g supports document editing via Google Docs I’ll cover configuration of Google Docs integration as well. Again, I’ll repeat myself, for detailed explanation how stuff works refer to the original article as concept is the same. So let’s begin…
Continue reading

Cutting infrastructure costs using Google Apps

Messaging infrastructure is the core component of almost all IT systems at various companies. E-mail communications are today so prevalent that often users and managers consider it as common sense that e-mail is always and everywhere available to them.

Due to legacy software or often incomplete or even, one could say, close-minded way of thinking about email many companies are stuck with inferior e-mail solutions be it open source or proprietary. Most importantly, those solutions in most cases cost a lot and they often have hidden costs that you take for granted but when you add them up you can come up to a number that can blow your head off. Continue reading

Amazon Web Services Presentation

Recently I held a presentation on Amazon Web Services. It was quite short and gave an overview of what Amazon AWS is and how it can help organizations.

PowerPoint presentation is available for download here: Amazon-Web-Services.pptx [318 KB]

I hope that someone finds it helpful.

Site updates: Twitter integration & cleanup

First of all, I owe an appology to some of my readers for not approving and responding to their comments. Unfortunately, e-mail notifications I receive were blocked by my spam filter. Hopefully, this won’t happen again.

I’ve done some cleanup on the site and updated my personal info. I’ve also decided to activate downloads section, so expect to see some new and interesting content there.

Finally, I’ve enabled twitter integration on the blog. If you want to follow me on twitter then please go to my page.

That’s it for the site updates, hopefully I’ll find time to post some new articles soon.

Afresco integration with Active Directory using Kerberos

After the first article about Alfresco’s integration with Active Directory has spurred a lot of interest I’ve decided to write a follow up article that shows how to use Kerberos for authentication. I’ll divide the article in three main parts:

  1. Description of Kerberos authentication process
  2. Setting up requirements for Kerberos implementation
  3. Alfresco configuration parameters for Kerberos

Continue reading

Alfresco – 2010 Roadmap

As some of you might have seen Alfresco team have published a roadmap for the year 2010. I must say I’m very pleased with the latest community version 3.2r2 (actually, couple more SVN releases than the official 17458 but lets not be picky). In my opinion, it is finally what was 3.0 “Stable” supposed to be and I’m not counting the DoD module here but the overall stability of the product.

My wishes for community edition in 2010:

  • content store selector available in enterprise edition only
  • better support for Windows 7
  • fine-tuning of UI controls in Share as some are really not available where they should be, hint: Edit Online on document details page
  • content modeling in GUI not just in XMLs
  • ability to chain SSO authentication with something non-SSO e.g. NTLM SSO + ldap if it is even technically possible
  • faster merging of bug-fixes from enterprise branch
  • better support from Alfresco team in forums though I must say they’ve been very helpful in last 6-9 months
  • publishing critical patches for last stable community release and not just telling people to “it’s in HEAD”
  • abandon policy to have certain functionality in enterprise edition only as legally possible

+ of course, everything they have in their published roadmap 😉

Alfresco, Sharepoint protocol, Office 2007, and Vista

To get the Alfresco’s implementation of Sharepoint protocol to work properly on Vista/Office 2007 with NTLM authentication you need to mess a bit with your registry. I consider you did your job and are using fully updated versions of both Vista and Office.

Open up the Alfresco Share, navigate to the document library and locate some Office document and click Edit online. Likely, the file will open read-only. So, implement this fix from Microsoft’s article KB 870853.

If you try again, file will open up fine but when you save it you’ll get an error message stating “Word did not save the document”. To fix this stop or disable the Web Client Windows service using Services MMC snap-in.

Sharepoint protocol should work completely now.

Deny delete permission to space owner in Alfresco

If you are the guy/girl responsible for implementing permissions model for your business case you might find yourself in trouble. Let us imagine a scenario where business case states that certain users should be able to create new content but not be able to delete anything. That’s the easy one, you’ll likely say, and assign Contributor role to those users.

Lets check Contributor role definition from the Alfresco wiki:

Contributor
Includes the Consumer permission group and adds AddChildren and CheckOut.
They will, by default own anything they create and have the ROLE_OWNER authority.

Hm, ROLE_OWNER looks suspicious:

“FullControl” granted to “ROLE_OWNER”
The owner (as defined by the ownable aspect, or, if the aspect is not present the node creator) is allowed all rights. This interacts with contributor for cm:content. They only need the right to create content in the default set up; all other rights come from the fact that they own the nodes they create.

To sum up the above, users that are just contributors can delete everything they create and you likely don’t want that.

So what can we do to fix this. There are two options and both have certain drawbacks. Continue reading

Alfresco – Tomcat as a Windows service

Question that is often asked and that usually leaves new users puzzled is how to run Alfresco on Apache Tomcat 6 as a Windows service. Unlike most examples found on the Internet I’ll show you how to mostly use graphical interface instead of command line. This article assumes you have Alfresco Tomcat bundle extracted at C:\Alfresco and Java 6 installed at default location. Continue reading

Alfresco integration with Active Directory

One of the main features of the Alfresco ECM System is the ability to integrate user authentication and synchronization with Microsoft Active Directory.

Unfortunately, integration is not trivial and it is error prone. While I’ll provide explanations how stuff works you can also have a look at the chapter 6 of the Professional Alfresco: Practical Solutions for Enterprise Content Management (Wrox Programmer to Programmer) [amazon.com] book or to the chapter 4 of the Alfresco 3 Enterprise Content Management Implementation [amazon.com] book.

In this guide I’ll show you how to achieve full integration with Active Directory which includes Alfresco Explorer and Alfresco Share SSO, CIFS SSO, and Active Directory (LDAP) users and groups synchronization. Continue reading

© 2024 Another Stranger Me

Theme by Anders NorenUp ↑