In section 6, the authors state that “Spyglass focuses on how to exploit file metadata properties to improve search performance and scalability”. What is being sacrificed to improve search performance and scalability? How major is the sacrifice? what can be done to mitigate the negative effects of the sacrifice?
In the future work it is said that "we propose fully distributing Spyglass across a cluster by allowing partitions to be replicated and migrated across machines and , distributing the index is a matter of effectively scaling the Spyglass index tree". What could be the other major challenges in implementing the same?
Do you know of any other similar approaches suggested for metadata searching? In the paper, I see that they have compare their system with the performance of postgresql and mysql, i wonder why they have not compared their system to other commercial file metadata search systems that they mention in 2.2?
One of the important feature of Spyglass involves collecting metadata changes , which it does using NetApp WAFL file system, which provides snapshot technology which significantly increases crawling performance. Is Spyglass tied to WAFL filesystem only or can it be extended to other filesystem and yet deliver similar performance ?
I guess, the take away point here is use of some kind of snapshot technology for crawling purpose. So any file system which provides this technology is sufficient. And the performance of the system (for crawling) is independent of the file system. I mean if the file system is able to take snapshots sufficiently fast enough, then the performance (for crawling) will be uniform.
It is said that,most queries can be satisfied with a slightly stale index.If the index is stale,then it will point to wrong records,how can it give right results?
If a certain file satisfying a search query in one of the partition gets corrupt or is not reachable ,how is it handled by Spyglass...Is the partition in which the file is present crawled again?
In section 1, the authors say that search results need to be secure and that the current systems ignore file ACLs or enforce them at a significant cost. What are file ACLs?
Since the security aspect of Spyglass has been left for future work and there is no mention of enforcing file ACLs in section 6, what can we speculate about Spyglass's efficiency in enforcing File ACLs?
In section 6, the authors state that “Spyglass focuses on how to exploit file metadata properties to improve search performance and scalability”. What is being sacrificed to improve search performance and scalability? How major is the sacrifice? what can be done to mitigate the negative effects of the sacrifice?
ReplyDeleteIn the future work it is said that "we propose fully distributing Spyglass across a cluster by allowing partitions to be replicated and migrated across machines and , distributing the index is a matter of effectively scaling the Spyglass index tree".
ReplyDeleteWhat could be the other major challenges in implementing the same?
Do you know of any other similar approaches suggested for metadata searching? In the paper, I see that they have compare their system with the performance of postgresql and mysql, i wonder why they have not compared their system to other commercial file metadata search systems that they mention in 2.2?
ReplyDeleteOne of the important feature of Spyglass involves collecting metadata changes , which it does using NetApp WAFL file system, which provides snapshot technology which significantly increases crawling performance. Is Spyglass tied to WAFL filesystem only or can it be extended to other filesystem and yet deliver similar performance ?
ReplyDeletethe author mentions about locality ratio and how it is calculated....but how and when is the value updated?
ReplyDeleteIs there any overhead of calculating that every time?
@pramod.
ReplyDeleteI guess, the take away point here is use of some kind of snapshot technology for crawling purpose. So any file system which provides this technology is sufficient. And the performance of the system (for crawling) is independent of the file system.
I mean if the file system is able to take snapshots sufficiently fast enough, then the performance (for crawling) will be uniform.
It is said that,most queries can be satisfied with a slightly stale index.If the index is stale,then it will point to wrong records,how can it give right results?
ReplyDeleteIf a certain file satisfying a search query in one of the partition gets corrupt or is not reachable
ReplyDelete,how is it handled by Spyglass...Is the partition in which the file is present crawled again?
In section 1, the authors say that search results need to be secure and that the current systems ignore file ACLs or enforce them at a significant cost. What are file ACLs?
ReplyDeleteSince the security aspect of Spyglass has been left for future work and there is no mention of enforcing file ACLs in section 6, what can we speculate about Spyglass's efficiency in enforcing File ACLs?