Add/Update UpdateRequestProcessor using SolrJ

While it is not possible to add a new UpdateRequestProcessorChain, there is a Config API to add/update/delete UpdateProcessor. Following is how its done using SolrJ.

<updateRequestProcessorChain name="dedupe" processor="myprocessor">
  <processor class="solr.processor.SignatureUpdateProcessorFactory">
    <bool name="enabled">true</bool>
    <str name="signatureField">id</str>
    <bool name="overwriteDupes">false</bool>
    <str name="fields">name,features,cat</str>
    <str name="signatureClass">solr.processor.Lookup3Signature</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

<updateProcessor class="solr.RemoveBlankFieldUpdateProcessorFactory" name="myprocessor"/> 
String command = "{\"update-updateprocessor\": {"name": "myprocessor", "class": "solr.FirstFieldValueUpdateProcessorFactory", "fieldName": "test_s"} }}";  
GenericSolrRequest rq = new GenericSolrRequest(SolrRequest.METHOD.POST, "/config", null);   
ContentStream content = new ContentStreamBase.StringStream(command); rq.setContentStreams(Collections.singleton(content)); rq.process(solrClient);

This will update the above defined RemoveBlankFieldUpdateProcessorFactory(myprocessor) in solrconfig.xml with FirstFieldValueUpdateProcessorFactory.

Posted in Thechy Stuff | Tagged , | Leave a comment

Set HttpBasicAuth credentials through code when using SolrJ

When using Solr Basic Authentication we need to set HttpBasicAuth credentials at JVM level(explained here) at the Client side to authenticate all requests. To do this we need to pass in JVM args

-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory

and set following HttpBasicAuth credentials in the properties file

httpBasicAuthUser=my_username
httpBasicAuthPassword=secretPassword

But sometimes there is not an option to store above credentials in properties files but it can be pulled from other secure service like AWS Secrets Manager. In such cases we need to set the above properties in the code, Following is a piece of code which does exactly that.

final ModifiableSolrParams params = new ModifiableSolrParams();
params.set("httpBasicAuthUser", my_username);
params.set("httpBasicAuthPassword", secretPassword);
PreemptiveBasicAuthClientBuilderFactory.setDefaultSolrParams(params);
final CloseableHttpClient httpClient = HttpClientUtil.createClient(params);
final CloudSolrClient cloudSolrClient = new CloudSolrClient.Builder(Arrays.asList(solrZkHosts), Optional.empty()).withHttpClient(httpClient).build();

Happy Searching!

Posted in Thechy Stuff | Tagged , , , , | Leave a comment

FuzzyQuery issue with Solr RealTimeGet

Its almost more than 3 years without a post, many things happened in these 3 years which i had never seen in life or imagined that something like this can happen. But slowly and steadily Pandemic is receding and Good days are back! So i thought i would resume with some “Hard to Solve” problems and their solution, often not found on the internet.

When using RealTimeGet(/get) you can always use the FilterQuery(fq) param to filter the result documents. But when you use a FuzzyQuery you get this below mentioned Exception and the request fails:

ERROR o.a.solr.handler.RequestHandlerBase – java.lang.UnsupportedOperationException: Query Blended() does not implement createWeight
        at org.apache.lucene.search.Query.createWeight(Query.java:66)

If you have seen something like this here is a quick fix:

This basically happens due to the following statment in process() method in the org.apache.solr.handler.component.RealTimeGetComponent.java where we are using Query objects rewrite method, and FuzzyQuery does not implement createWeight method and hence the above exception is thrown

Query q = raw.rewrite(searcherInfo.getSearcher().getIndexReader());

So to fix this issue we need to write a Custom Solr SearchComponent and use IndexSearcher ‘s rewrite method as shown below:

Query q = searcherInfo.getSearcher().rewrite(raw);

That should do it!

Posted in Thechy Stuff | Leave a comment

Hashing for Similarity Detection

Few Months back i was researching on Hash for Similarity Detection, this blog post is the outcome of that.

Hashing or Hash Function as described on Wiki is any Function which maps data of arbitrary size to data of fixed size. Hashing is used almost everywhere with different purposes. It is used in Network Data Transfer, Authentication, Protecting Data, De-Duplication, Data Integrity, Text/Document Similarity and so on.

Mostly hashes are distinguished into 2 groups:

>> Cryptographic Hashes

>> Non-Cryptographic Hashes

Cryptographic Hashes are one way non-reversible hashes mainly used in security applications. The message once encoded into hash cannot be reverse engineered back to the message except by some brute-force method which is time consuming and difficult.

Non-Cryptographic Hashes on the other hand are mainly used for Data Integrity(CRC, Checksums) or Data Similarity. It is used to check whether 2 files (one on the server and other downloaded) are exactly the same, used in Antivirus Application, Spam Detection etc. Non cryptographic hashes are also used in computer forensics for detecting Similar information, we’ll speak more about this in this blog post.

Here is a Wiki List for all the Cryptographic/Non Cryptographic Hash Functions/ Algorithm. What doesn’t seem to be mentioned in the wiki list are hash Functions/Algorithms concentrated on detection of Data Similarity. Often in the Digital Forensic Investigation there is need to detect Similar files, first most prominent Algorithm ssdeep was out in 2006 which then led to many other whitepapers and algorithms and terms like “context-triggered piecewise hashing (CTPH)”, “fuzzy hashing”, “similarity hashing” were coined. Following are some few which are quite known in the Computer Forensic world.

The most prominent ones used for Data Similarity Detection are:

>> ssdeep (http://ssdeep.sourceforge.net/), based on SpamSum Algorithm

>> sdhash (http://roussev.net/sdhash/sdhash.html)

sdhash has a disadvantage that the hash size goes on increasing with the input file size. hence ssdeep looks to be the better one. Following are some other algorithms to consider:

>> mvHash-B (https://dasec.h-da.de/staff/breitinger-frank/)

>> msrsh-v2 (https://dasec.h-da.de/staff/breitinger-frank/)

>> And many others TLSH, MinHash, SimHash… etc do exist.

 

Further Reading:

Click to access paper-automated_evaluation_of_approximate_matching_algorithms_on_real_data.pdf

 

Posted in Thechy Stuff | Leave a comment

Quick Virus/Malware Removal HowTo

spyware

Its been a long time since a post, i thought I’d just add a simple one from my past experiences. In some distant past some of my friends use to approach me for help to get rid of some nasty viruses/malwares on their laptops/desktops. In some case the virus/malware was installed automatically while browsing a website, installed malware/virus software was a antivirus software and the antivirus screen would pop up again and again claiming there is a virus on the machine and there needs to be a payment made to remove it. Another case i remember was that there were some strange files getting created automatically in each folder and it would keep on creating the files till the disk space is full.

Best solution to this Virus/Malware problem is Format and Reinstall and make your self a new fresh machine. But there are cases where you need NOT do that.

Following are some basic steps to get rid of some simple viruses installed automatically on your Windows machine.

>> First thing you want to do is disconnect your laptop from internet. Put off your WiFi, remove the LAN cable. Just isolate the machine from internet.

>> Then try to find the process which is running in the background or foreground using Task Manager. OR you can use Process Explorer (https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer). Process Explorer has a drag utility which you can drag onto a window and it shows you the corresponding process.Try to look for some suspicious process names and check if its genuine process by google for the process name.

>> When you have found the malicious process. Kill that process. Remove the process from StartUp programs in Process Explorer. Some processes do not get killed unless you first kill explorer.exe.

>> After killing the process. Find the path where the Program got installed, generally in C:/Progam Files, delete the malware program folder, if it does not delete force delete it.

> Many viruses dont stop even after killing, they reappear again and again. Boot your Windows machine in Safe Mode without Networking and then do the above steps, it should work. Safe Mode can be reached on most machine by pressing F8 on your machine Bootup.

After having a clean machine, Please consider installing a good Antivirus (Not a free one) for all your devices. I have had cases where the machine had Avast or AVG free version and still there was some malware/virus installed automatically. I feel money spent on antivirus software does pay off in most of the cases. Unless you are unlucky to get attacked before antivirus companies reach to it.

Posted in Thechy Stuff | Leave a comment

[Docker for Windows] Certificate Error Solution

Problem Definition: After installing “Docker for Windows” on Windows 10 Professional box, when you type any docker command or lets say for eg. docker ps, you get following error:

could not read CA certificate "C:\\Users\\UserName\\.docker\\machine\\machines\\default\\ca.pem": open C:\\Users\\UserName\\.docker\machine\machines\default\ca.pem: The system cannot find the path specified.

And in the log.txt located in “C:\Users\UserName\AppData\Local\Docker\log.txt” you get a warning like following:

[11:14:53.591][DockerClientEnvironmentChecker][Warning] DOCKER_HOST environment variable detected, docker may not work properly

[11:14:53.591][DockerClientEnvironmentChecker][Warning] DOCKER_TLS_VERIFY environment variable detected, docker may not work properly

 

Solution: you need to delete all DOCKER_* environment variables from your machine. Which needs to be done in 2 steps:

Step 1> Go to Control Panel\All Control Panel Items\System Then click Advanced system settings, In System Propteries, Go to Advanced Tab and Click Environment Variables. Delete all DOCKER_* from System/User variables.

Step 2> Remove DOCKER_* from command prompt or PowerShell, i used PowerShell. using following steps

[Environment]::SetEnvironmentVariable("DOCKER_CERT_PATH", $null, "User")

[Environment]::SetEnvironmentVariable("DOCKER_HOST", $null, "User")

[Environment]::SetEnvironmentVariable("DOCKER_MACHINE_NAME", $null, "User")

[Environment]::SetEnvironmentVariable("DOCKER_TLS_VERIFY", $null, "User")

[Environment]::SetEnvironmentVariable("DOCKER_TOOLBOX_INSTALL_PATH", $null, "User")

Now Close and Open Powershell again and now run docker ps it will work fine without any Certificate Error.

Posted in Thechy Stuff | 2 Comments

Petya attack is In Progress

Just received a security advisory from Trend Micro about a Ransomware attack in progress which is said to be a variant of Petya.

Petya

Guys, Please update all your devices as a first step!

 

 

Posted in Thechy Stuff | Leave a comment

Reduce JPEG size up to 35% with Guetzli

Google has recently open sourced a JPEG encoder which reduces a uncompressed JPEG image up to 35% of its file size. Check it out on Github: https://github.com/google/guetzli/.

This means less bytes transmitted over the wire!!!

1f3f88b6-162c-11e7-990a-731b2560f15c

 

Posted in Thechy Stuff | Leave a comment

Part of Books from My Collection

books

Posted in Thechy Stuff | Leave a comment

Allo – new AI Assistant

There has been a steep increase in the research in recent days on AI and related technologies/techniques. Another spike of trends in adoption of Neural Networks, now Deep Neural Networks. The recent launch of Allo by Google seems to be an outcome of that.

allo-logo

Allo is a any other chat app like whatsapp but an addition of Google (AI) Assistant which answers your questions, sets reminders for you, searches for places around you based on where u r located and much more. For some questions though which it does not have answers gives the top best google search result. It also comes with another feature of predicting a response to a chat message. But i personally find it a nice handy app to set a quick reminder or lets say, tell it to give me weather forecast every morning at 10 and it does it as commanded. ALLO!

Further Reading:

https://allo.google.com/

http://www.forbes.com/sites/mattdrange/2016/09/21/meet-googles-ai-assistant-behind-the-new-messaging-app-allo/#7b72b2676b57

 

 

Posted in Thechy Stuff | Leave a comment