New AI-based Features on the RP Photonics Website
Posted on 2025-02-07 as part of the Photonics Spotlight (available as e-mail newsletter!)
Permanent link: https://www.rp-photonics.com/spotlight_2025_02_07.html
Author: Dr. Rüdiger Paschotta, RP Photonics AG
Abstract: RP Photonics has introduced several AI-based features on its website for the benefits of users and advertisers. The technical background is explained.
Authors: Gareth Moore and Rüdiger Paschotta
The RP Photonics website is famous for its massive amount of high-quality content, with over 1000 encyclopedia articles, many blog articles, tutorials and case studies and more. We mainly focus on the quality of content, but also don't neglect additional aspects like usability. Now, we have introduced substantial improvements of usability by utilizing artificial intelligence:
- Semantic full text article search: The search tool of the Encyclopedia can now not only find articles based on entered keywords, which would need to occur literally there, but works based on semantic similarity (which we explain in some depth below). You may now enter things like “how to make femtosecond pulses shorter”, and that way find articles on pulse compression (even if you were not aware of the term “compression”).
- By the way, the search can be configured to include (or not include) various other types of articles, e.g. blog articles, tutorials and case studies. One can also include product descriptions of our advertisers, which we think advertisers will love!
- Product search: When searching suppliers for some items in our Buyer's Guide, the first step is usually to identify a suitable product category in order to get the list of suppliers for that. It is not always obvious which category to choose, but by also applying semantic search to this, we now make it substantially easier for our users. In addition, they get search results based on product descriptions of our advertisers.
- Assistant for improving product descriptions: The key feature of our advertising package is to publish product descriptions, which then appear (a) in the lists of suppliers for the chosen product categories, (b) on the supplier's profile page, and (c) even in related encyclopedia articles. The description texts should be optimized for best effect, and here our new AI-based assistant can now help a lot. It has been instructed to consider all the advice which we usually give to our clients. We have integrated that assistant very conveniently into the forms (e.g. the registration form for new suppliers): You can see the modified description, compare it with your original one, and edit it further.
- This feature will be particularly helpful for people being not to strong with the English language, but can also produce useful ideas.
- The next Spotlight article explains another tool: AI-based features for implementing structured purchasing processes.
In the following, we provide detailed background information on semantic search.
The Need for a Meaning-based Approach
Traditional search relies on simple keyword matching. This implies that searching for “optical attenuation” would never surface an article on “light absorption”, despite the conceptual relation: absorption is what causes attenuation (among other factors). Even a full-text search may fail if the entered search term does not occur in a relevant article because the author uses another term with the same or similar meaning.
Therefore, we need a search system that understands meaning rather than just words; that is called semantic search. It ranks the relevance of content based on semantic or conceptual closeness rather than on exact phrasing. That has become possible with techniques of artificial intelligence and natural language processing, involving huge amounts of text data having been processed in sophisticated ways. Using such technology, we can now better match the user's actual needs.
Encoding Meaning: The Concept of Semantic Space
To understand how semantic search works, we first need to consider how meaning can be encoded. Machine learning models are trained to convert words, sentences, or paragraphs into a numerical representation known as semantic vectors that exist in a high-dimensional, so-called “semantic space”. The closer some words or sentences are in terms of meaning and related concepts, the more similar are the corresponding semantic vectors in direction and magnitude.
A classic example of this is the relationship between “king” and “queen,” or “man” and “woman”. These pairs are “embedded” near each other in semantic space. For example, the difference between “king” and “queen” is rather small, while “king” is far from “cling” despite the similarity of the character sequences. This principle extends beyond simple word analogies: “laser” and “optical amplification” are also quite close in a photonics-related semantic space.
Once the meaning is encoded with such vectors, search becomes a matter of comparison. The first step is to store the semantic vectors for the entire content of our Encyclopedia, for example, in a database. Now, when a user enters some term(s), we produce a semantic vector for that, and subsequently search the database for the vectors closest to that.
That's at least the basic principle of the applied methods. For good results, some more work needs to be done, as explained in the following.
Challenges in Implementing Semantic Search
While semantic search offers a powerful and seemingly straight forward solution, implementing it effectively presents several challenges. These challenges come from both the natural complexity and ambiguity of language and the technical hurdles of automating initial processing of long and complex texts.
Language Ambiguity
General language ambiguity is the first challenge. Many words and phrases have multiple meanings, depending on the context in which they appear. The word “fiber” could refer to dietary fiber, optical fiber, or to textiles, leading to potential wrong embedding in the semantic space. Without a mechanism for understanding context, the system risks returning irrelevant information. At least, that problem is smaller for search in a volume of text which focuses on a specific area. For example, we have a lot on fiber optics, while not elaborating on dietary fiber anywhere.
Semantic Chunking
Semantic search also requires a mechanism for breaking down text into meaningful sections, called chunks, before semantic embedding. Simply chopping text into fixed-length pieces, or even single sentences, risks losing coherence, while overly long sections dilute specificity. Striking the right balance is crucial: relevant context must be preserved while trying to have each chunk encapsulate one idea. And of course, that chunking needs to be automated: We don't have time to manually chunk thousands of articles.
To address these issues, we take a structured approach. First, we break an encyclopedia article into paragraphs, leveraging the natural human chunking provided by the author. Next, we enrich these chunks with contextual information, incorporating the article title, relevant concept definition from metadata, and section headings to maintain context. Finally, we embed the chunks, calculate the similarity between consecutive paragraphs in semantic space and, if they are close enough in meaning, we merge them into a single unit. The combined chunks are then re-embedded, with the whole process often being referred to as semantic chunking. This process tries to preserve context while reducing fragmentation, improving the meaningfulness of the embeddings. In this way we are able to create a database of chunks or snippets of meaning linked to each part of the encyclopedia.
Working out the details of that chunking turned out to be the major challenge, consuming much of the overall time to implement semantic search.
Reranking
To a general-purpose semantic embedding model, a good deal of photonics related texts could seem rather similar, even with a strong embedding protocol. A search for “how to simulate ASE in a pulse amplifier” could return the “Amplified Spontaneous Emission” article before the (slightly more appropriate) tutorial chapter “Modelling of Pulse Amplification, part 5: Amplified Spontaneous Emission”.
While both results are very appropriate, we can better assess the contextual relevance by an additional reranking process, powered by a large language model (LLM), containing more intelligence. For instance, a search for “fiber lasers for cutting” might initially prioritize the general article on “fiber lasers”, but reranking prioritizes content specific to cutting applications like the “laser cutting” article.
Energy Concerns
We are concerned about the huge amount of electricity which is consumed by AI features, as that has substantial environmental and other impacts. Therefore, we obviously consider that aspect in what we do. Fortunately, our implementation is not substantially energy-intensive at all. For example, semantic embedding even of the whole Encyclopedia is not a huge thing, let alone the embedding of a limited number of search inputs. That is also reflected by the quite low cost of the used external AI services; the cost of our AI features is completely dominated by our working time required for developing this, while running costs are negligible.
Privacy Matters
Companies like Google, Amazon, and Meta utilize massive data collection, including user data. Many more or less ignore the problematic privacy implications of that, but as a privacy-conscious company, RP Photonics leverages modern search techniques without relying on problematic user tracking. Some details:
- The IP addresses of our website users, which to some extent could be used to track them, are not stored, and are also not transmitted to the US-based AI servers of Pinecone and OpenAI. (These see only the IP address of our web server, to which they respond.) So unless a user enters something which could be used to identify him (which we consider unlikely), full privacy will be maintained. In particular, you will not receive spam mails after having indicated an interest in certain products on our site.
- We could, in principle, store the users' search terms in order to further optimize our features. However, we don't so far, and would not start that without indicating that below the input field.
Security Concerns
When implementing such features on a website, one should be careful not to introduce serious security issues. In particular, where any user input is processed, which might contain malicious code, one must make sure not to let that code be executed. For example, it could be fatal to let the web server execute some malicious PHP code. Another attack vector would involve cross-site scripting, where Javascript code can be executed.
There is a substantial number of ways in which innocent looking code can allow attacks with possibly disastrous consequences. Therefore, we would discourage anyone from realizing such functionality without having a proper understanding of these issues.
By the way, there are great AI tools to identify and fix security issues. We often utilize AI also for that purpose.
Conclusions
We expect that users will appreciate a lot the further enhanced functionality. But nothing is perfect, and feedback helping us to further improve the features is warmly welcome.
A key feature is semantic search, which we now apply to both article search and the search for suitable product categories for supplier search. We have explained the principles of these search features and some of the complications, which required substantial time to be spent. Still, we believe these efforts are worthwhile.
Our general attitude towards artificial intelligence is to use it in the best possible way to generate value for our users and advertisers (while others waste resources or abuse the technology for manipulation). Still, expert human knowledge stays the crucial element of value generation, and we think that will stay so for the foreseeable future. We will certainly not engage in AI-based mass production of content of dubious quality (affected by hallucinations and relying on unverified claims), as it unfortunately becomes more and more common. We hope that both our huge user community and advertisers will even increasingly appreciate the value of expert human knowledge – which is now better utilized with the help of AI.
This article is a posting of the Photonics Spotlight, authored by Dr. Rüdiger Paschotta. You may link to this page and cite it, because its location is permanent. See also the RP Photonics Encyclopedia.
Note that you can also receive the articles in the form of a newsletter or with an RSS feed.
Share this with your network:
Follow our specific LinkedIn pages for more insights and updates: