際際滷shows by User: kevig / http://www.slideshare.net/images/logo.gif 際際滷shows by User: kevig / Wed, 18 Dec 2024 09:06:23 GMT 際際滷Share feed for 際際滷shows by User: kevig TEXT DATA LABELLING USING TRANSFORMER BASED SENTENCE EMBEDDINGS AND TEXT SIMILARITY FOR TEXT CLASSIFICATION /slideshow/text-data-labelling-using-transformer-based-sentence-embeddings-and-text-similarity-for-text-classification-1759/274179415 11222ijnlc01-241218090624-179ed369
This paper demonstrates that a lot of time, cost, and complexities can be saved and avoided that would otherwise be used to label the text data for classification purposes. The AI world realizes the importance of labelled data and its use for various NLP applications. Here, we have labelled and categorized close to 6,000 unlabelled samples into five distinct classes. This labelled dataset was further used for multi-class text classification. Data labelling task using transformer-based sentence embeddings and applying cosine-based text similarity threshold saved close to 20-30 days of human efforts and multiple human validations with 98.4% of classes correctly labelled as per business validation. Text classification results obtained using this AI labelled data fetched accuracy score and F1 score of 90%. ]]>

This paper demonstrates that a lot of time, cost, and complexities can be saved and avoided that would otherwise be used to label the text data for classification purposes. The AI world realizes the importance of labelled data and its use for various NLP applications. Here, we have labelled and categorized close to 6,000 unlabelled samples into five distinct classes. This labelled dataset was further used for multi-class text classification. Data labelling task using transformer-based sentence embeddings and applying cosine-based text similarity threshold saved close to 20-30 days of human efforts and multiple human validations with 98.4% of classes correctly labelled as per business validation. Text classification results obtained using this AI labelled data fetched accuracy score and F1 score of 90%. ]]>
Wed, 18 Dec 2024 09:06:23 GMT /slideshow/text-data-labelling-using-transformer-based-sentence-embeddings-and-text-similarity-for-text-classification-1759/274179415 kevig@slideshare.net(kevig) TEXT DATA LABELLING USING TRANSFORMER BASED SENTENCE EMBEDDINGS AND TEXT SIMILARITY FOR TEXT CLASSIFICATION kevig This paper demonstrates that a lot of time, cost, and complexities can be saved and avoided that would otherwise be used to label the text data for classification purposes. The AI world realizes the importance of labelled data and its use for various NLP applications. Here, we have labelled and categorized close to 6,000 unlabelled samples into five distinct classes. This labelled dataset was further used for multi-class text classification. Data labelling task using transformer-based sentence embeddings and applying cosine-based text similarity threshold saved close to 20-30 days of human efforts and multiple human validations with 98.4% of classes correctly labelled as per business validation. Text classification results obtained using this AI labelled data fetched accuracy score and F1 score of 90%. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/11222ijnlc01-241218090624-179ed369-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This paper demonstrates that a lot of time, cost, and complexities can be saved and avoided that would otherwise be used to label the text data for classification purposes. The AI world realizes the importance of labelled data and its use for various NLP applications. Here, we have labelled and categorized close to 6,000 unlabelled samples into five distinct classes. This labelled dataset was further used for multi-class text classification. Data labelling task using transformer-based sentence embeddings and applying cosine-based text similarity threshold saved close to 20-30 days of human efforts and multiple human validations with 98.4% of classes correctly labelled as per business validation. Text classification results obtained using this AI labelled data fetched accuracy score and F1 score of 90%.
TEXT DATA LABELLING USING TRANSFORMER BASED SENTENCE EMBEDDINGS AND TEXT SIMILARITY FOR TEXT CLASSIFICATION from kevig
]]>
6 0 https://cdn.slidesharecdn.com/ss_thumbnails/11222ijnlc01-241218090624-179ed369-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
UNSCRAMBLING CODES: FROM HIEROGLYPHS TO MARKET NEWS /slideshow/unscrambling-codes-from-hieroglyphs-to-market-news-72b0/273985528 11622ijnlc03-241211065930-df2d9d0e
This paper reviews some of the steps that paved the way for the development of sentiment analysis (or opinion mining), a technique apparently used by Jim Simons Medallion fund for scoring an impossible performance: a 66% annual average rate of return in the 31 years between 1988 and 2018. Sentiment analysis is a powerful tool that uses natural language processing (NLP), or computational linguistics, to determine whether a text about a company is positive, negative or neutral and, in a final analysis, to discover stock price patterns. Humans have always used symbols to communicate, plainly or secretively. Here we review some of the methods used in the past centuries, including Egyptians hieroglyphs, Julius Caesars cipher, Fibonaccis abbreviations, Leonardo da Vincis Mirror Writing, Mary Stuarts code. The intention is to describe some passages of the long journey made by human beings to arrive at the current sophisticated IT tools for sentiment analysis.]]>

This paper reviews some of the steps that paved the way for the development of sentiment analysis (or opinion mining), a technique apparently used by Jim Simons Medallion fund for scoring an impossible performance: a 66% annual average rate of return in the 31 years between 1988 and 2018. Sentiment analysis is a powerful tool that uses natural language processing (NLP), or computational linguistics, to determine whether a text about a company is positive, negative or neutral and, in a final analysis, to discover stock price patterns. Humans have always used symbols to communicate, plainly or secretively. Here we review some of the methods used in the past centuries, including Egyptians hieroglyphs, Julius Caesars cipher, Fibonaccis abbreviations, Leonardo da Vincis Mirror Writing, Mary Stuarts code. The intention is to describe some passages of the long journey made by human beings to arrive at the current sophisticated IT tools for sentiment analysis.]]>
Wed, 11 Dec 2024 06:59:30 GMT /slideshow/unscrambling-codes-from-hieroglyphs-to-market-news-72b0/273985528 kevig@slideshare.net(kevig) UNSCRAMBLING CODES: FROM HIEROGLYPHS TO MARKET NEWS kevig This paper reviews some of the steps that paved the way for the development of sentiment analysis (or opinion mining), a technique apparently used by Jim Simons Medallion fund for scoring an impossible performance: a 66% annual average rate of return in the 31 years between 1988 and 2018. Sentiment analysis is a powerful tool that uses natural language processing (NLP), or computational linguistics, to determine whether a text about a company is positive, negative or neutral and, in a final analysis, to discover stock price patterns. Humans have always used symbols to communicate, plainly or secretively. Here we review some of the methods used in the past centuries, including Egyptians hieroglyphs, Julius Caesars cipher, Fibonaccis abbreviations, Leonardo da Vincis Mirror Writing, Mary Stuarts code. The intention is to describe some passages of the long journey made by human beings to arrive at the current sophisticated IT tools for sentiment analysis. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/11622ijnlc03-241211065930-df2d9d0e-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This paper reviews some of the steps that paved the way for the development of sentiment analysis (or opinion mining), a technique apparently used by Jim Simons Medallion fund for scoring an impossible performance: a 66% annual average rate of return in the 31 years between 1988 and 2018. Sentiment analysis is a powerful tool that uses natural language processing (NLP), or computational linguistics, to determine whether a text about a company is positive, negative or neutral and, in a final analysis, to discover stock price patterns. Humans have always used symbols to communicate, plainly or secretively. Here we review some of the methods used in the past centuries, including Egyptians hieroglyphs, Julius Caesars cipher, Fibonaccis abbreviations, Leonardo da Vincis Mirror Writing, Mary Stuarts code. The intention is to describe some passages of the long journey made by human beings to arrive at the current sophisticated IT tools for sentiment analysis.
UNSCRAMBLING CODES: FROM HIEROGLYPHS TO MARKET NEWS from kevig
]]>
9 0 https://cdn.slidesharecdn.com/ss_thumbnails/11622ijnlc03-241211065930-df2d9d0e-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Integration of Phonotactic Features for Language Identification on Code-Switched Speech /slideshow/integration-of-phonotactic-features-for-language-identification-on-code-switched-speech/273822085 11122ijnlc02-241204040225-f1371713
In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy. ]]>

In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy. ]]>
Wed, 04 Dec 2024 04:02:25 GMT /slideshow/integration-of-phonotactic-features-for-language-identification-on-code-switched-speech/273822085 kevig@slideshare.net(kevig) Integration of Phonotactic Features for Language Identification on Code-Switched Speech kevig In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/11122ijnlc02-241204040225-f1371713-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Integration of Phonotactic Features for Language Identification on Code-Switched Speech from kevig
]]>
6 0 https://cdn.slidesharecdn.com/ss_thumbnails/11122ijnlc02-241204040225-f1371713-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM /slideshow/warrants-generations-using-a-language-model-and-a-multi-agent-system/273634275 10621ijnlc02-241127052927-49e43d2b
Each argument begins with a conclusion, which is followed by one or more premises supporting the conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence to-sequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of warrant generation, our model generates a greater variety of warrants than other baseline models. The experimental results validate the effectiveness of our proposed hybrid model for generating warrants.]]>

Each argument begins with a conclusion, which is followed by one or more premises supporting the conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence to-sequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of warrant generation, our model generates a greater variety of warrants than other baseline models. The experimental results validate the effectiveness of our proposed hybrid model for generating warrants.]]>
Wed, 27 Nov 2024 05:29:26 GMT /slideshow/warrants-generations-using-a-language-model-and-a-multi-agent-system/273634275 kevig@slideshare.net(kevig) WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM kevig Each argument begins with a conclusion, which is followed by one or more premises supporting the conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence to-sequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of warrant generation, our model generates a greater variety of warrants than other baseline models. The experimental results validate the effectiveness of our proposed hybrid model for generating warrants. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/10621ijnlc02-241127052927-49e43d2b-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Each argument begins with a conclusion, which is followed by one or more premises supporting the conclusion. The warrant is a critical component of Toulmin&#39;s argument model; it explains why the premises support the claim. Despite its critical role in establishing the claim&#39;s veracity, it is frequently omitted or left implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence to-sequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of warrant generation, our model generates a greater variety of warrants than other baseline models. The experimental results validate the effectiveness of our proposed hybrid model for generating warrants.
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEM from kevig
]]>
6 0 https://cdn.slidesharecdn.com/ss_thumbnails/10621ijnlc02-241127052927-49e43d2b-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS /slideshow/question-answering-module-leveraging-heterogeneous-datasets/273453673 10621ijnlc01-241120042314-27bad93b
Question Answering has been a well-researched NLP area over recent years. It has become necessary for users to be able to query through the variety of information available - be it structured or unstructured. In this paper, we propose a Question Answering module which a) can consume a variety of data formats - a heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts when deemed relevant, based on user query and business context. Our solution provides a comprehensive and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying modules. Our solution is capable of handling a plethora of data sources such as text, images, tables, community forums, and flow charts. Our studies performed on a variety of business-specific datasets represent the necessity of custom pipelines like the proposed one to solve several real-world document question-answering]]>

Question Answering has been a well-researched NLP area over recent years. It has become necessary for users to be able to query through the variety of information available - be it structured or unstructured. In this paper, we propose a Question Answering module which a) can consume a variety of data formats - a heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts when deemed relevant, based on user query and business context. Our solution provides a comprehensive and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying modules. Our solution is capable of handling a plethora of data sources such as text, images, tables, community forums, and flow charts. Our studies performed on a variety of business-specific datasets represent the necessity of custom pipelines like the proposed one to solve several real-world document question-answering]]>
Wed, 20 Nov 2024 04:23:14 GMT /slideshow/question-answering-module-leveraging-heterogeneous-datasets/273453673 kevig@slideshare.net(kevig) QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS kevig Question Answering has been a well-researched NLP area over recent years. It has become necessary for users to be able to query through the variety of information available - be it structured or unstructured. In this paper, we propose a Question Answering module which a) can consume a variety of data formats - a heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts when deemed relevant, based on user query and business context. Our solution provides a comprehensive and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying modules. Our solution is capable of handling a plethora of data sources such as text, images, tables, community forums, and flow charts. Our studies performed on a variety of business-specific datasets represent the necessity of custom pipelines like the proposed one to solve several real-world document question-answering <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/10621ijnlc01-241120042314-27bad93b-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Question Answering has been a well-researched NLP area over recent years. It has become necessary for users to be able to query through the variety of information available - be it structured or unstructured. In this paper, we propose a Question Answering module which a) can consume a variety of data formats - a heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts when deemed relevant, based on user query and business context. Our solution provides a comprehensive and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying modules. Our solution is capable of handling a plethora of data sources such as text, images, tables, community forums, and flow charts. Our studies performed on a variety of business-specific datasets represent the necessity of custom pipelines like the proposed one to solve several real-world document question-answering
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS from kevig
]]>
8 0 https://cdn.slidesharecdn.com/ss_thumbnails/10621ijnlc01-241120042314-27bad93b-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY /slideshow/stress-test-for-bert-and-deep-models-predicting-words-from-italian-poetry-65a1/273259254 11622ijnlc02-241113041722-e79fb127
In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of 900 tokens mostly extracted from Italian poetry of the first half of last century. Then we alternated canonical and non-canonical versions of the same sentence before processing them with the same DL model. We used then sentences from the newswire domain containing similar syntactic structures. The results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases. In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary words.]]>

In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of 900 tokens mostly extracted from Italian poetry of the first half of last century. Then we alternated canonical and non-canonical versions of the same sentence before processing them with the same DL model. We used then sentences from the newswire domain containing similar syntactic structures. The results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases. In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary words.]]>
Wed, 13 Nov 2024 04:17:22 GMT /slideshow/stress-test-for-bert-and-deep-models-predicting-words-from-italian-poetry-65a1/273259254 kevig@slideshare.net(kevig) STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY kevig In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of 900 tokens mostly extracted from Italian poetry of the first half of last century. Then we alternated canonical and non-canonical versions of the same sentence before processing them with the same DL model. We used then sentences from the newswire domain containing similar syntactic structures. The results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases. In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary words. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/11622ijnlc02-241113041722-e79fb127-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of 900 tokens mostly extracted from Italian poetry of the first half of last century. Then we alternated canonical and non-canonical versions of the same sentence before processing them with the same DL model. We used then sentences from the newswire domain containing similar syntactic structures. The results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases. In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary words.
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY from kevig
]]>
11 0 https://cdn.slidesharecdn.com/ss_thumbnails/11622ijnlc02-241113041722-e79fb127-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Genetic Approach For Arabic Part Of Speech Tagging /slideshow/genetic-approach-for-arabic-part-of-speech-tagging-5acd/272158688 2313ijnlc01-241003041858-75d77194
With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic approaches.]]>

With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic approaches.]]>
Thu, 03 Oct 2024 04:18:58 GMT /slideshow/genetic-approach-for-arabic-part-of-speech-tagging-5acd/272158688 kevig@slideshare.net(kevig) Genetic Approach For Arabic Part Of Speech Tagging kevig With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic approaches. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2313ijnlc01-241003041858-75d77194-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic approaches.
Genetic Approach For Arabic Part Of Speech Tagging from kevig
]]>
7 0 https://cdn.slidesharecdn.com/ss_thumbnails/2313ijnlc01-241003041858-75d77194-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Rule Based Transliteration Scheme for English to Punjabi /slideshow/rule-based-transliteration-scheme-for-english-to-punjabi-51a2/272013289 2213ijnlc07-240925114849-04e99b3c
Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi. ]]>

Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi. ]]>
Wed, 25 Sep 2024 11:48:49 GMT /slideshow/rule-based-transliteration-scheme-for-english-to-punjabi-51a2/272013289 kevig@slideshare.net(kevig) Rule Based Transliteration Scheme for English to Punjabi kevig Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc07-240925114849-04e99b3c-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi.
Rule Based Transliteration Scheme for English to Punjabi from kevig
]]>
12 0 https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc07-240925114849-04e99b3c-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation /slideshow/interlingual-syntactic-parsing-an-optimized-head-driven-parsing-for-english-to-indian-language-machine-translation-3e1c/271864743 13424ijnlc02-240918041911-c9ce8c59
In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation. ]]>

In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation. ]]>
Wed, 18 Sep 2024 04:19:11 GMT /slideshow/interlingual-syntactic-parsing-an-optimized-head-driven-parsing-for-english-to-indian-language-machine-translation-3e1c/271864743 kevig@slideshare.net(kevig) Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation kevig In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc02-240918041911-c9ce8c59-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation.
Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation from kevig
]]>
13 0 https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc02-240918041911-c9ce8c59-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH TO INDIAN LANGUAGE MACHINE TRANSLATION /slideshow/interlingual-syntactic-parsing-an-optimized-head-driven-parsing-for-english-to-indian-language-machine-translation/271755262 13424ijnlc02-240912112428-653cde74
In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation.]]>

In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation.]]>
Thu, 12 Sep 2024 11:24:28 GMT /slideshow/interlingual-syntactic-parsing-an-optimized-head-driven-parsing-for-english-to-indian-language-machine-translation/271755262 kevig@slideshare.net(kevig) INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH TO INDIAN LANGUAGE MACHINE TRANSLATION kevig In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc02-240912112428-653cde74-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In the era of Artificial Intelligence (AI), significant progress has been made by enabling machines to understand and communicate in human languages. Central to this progress are parsers, which play a vital role in syntactic analysis and support various Natural language Processing (NLP) applications, including Machine Translation and sentiment analysis. This paper introduces a robust implementation of an optimized Head-Driven Parser designed to advance NLP capabilities beyond the limitations of traditional Lexicalized Tree Adjoining Grammar (L-TAG) based Parser. Traditional parser, while effective, often struggle with the capturing complexities of natural languages, especially translation between English to Indian languages. By leveraging Bi-directional approach and Head-Driven techniques, this research offers a revolutionary enhancement in parsing frameworks. This method not only improves performance in syntactic analysis but also facilitates complex tasks such as discourse analysis and semantic parsing. This research involves experimentation the Bi-Directional Parser on a dataset of 15,000 sentences, resulting a reduction in derivation variations compared to conventional TAG Parsers. This advancement highlights how Head-Driven Parsing can overcome traditional constraints and provide more reliable linguistic analysis. The paper demonstrates how this new implementation not only builds on the strengths of L-TAG but also addresses its limitations and contributes to expanding the scope of Tree Adjoining Grammarbased methodologies and advancing the field of Machine Translation.
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH TO INDIAN LANGUAGE MACHINE TRANSLATION from kevig
]]>
16 0 https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc02-240912112428-653cde74-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
A REVIEW OF PROMPT-FREE FEW-SHOT TEXT CLASSIFICATION METHODS /slideshow/a-review-of-prompt-free-few-shot-text-classification-methods-808e/271755238 13424ijnlc01-240912112321-ae5ac31e
Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%. ]]>

Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%. ]]>
Thu, 12 Sep 2024 11:23:21 GMT /slideshow/a-review-of-prompt-free-few-shot-text-classification-methods-808e/271755238 kevig@slideshare.net(kevig) A REVIEW OF PROMPT-FREE FEW-SHOT TEXT CLASSIFICATION METHODS kevig Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc01-240912112321-ae5ac31e-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%.
A REVIEW OF PROMPT-FREE FEW-SHOT TEXT CLASSIFICATION METHODS from kevig
]]>
45 0 https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc01-240912112321-ae5ac31e-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
A Review of Prompt-Free Few-Shot Text Classification Methods /slideshow/a-review-of-prompt-free-few-shot-text-classification-methods/271717491 13424ijnlc01-240911025129-9888452d
Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%. ]]>

Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%. ]]>
Wed, 11 Sep 2024 02:51:29 GMT /slideshow/a-review-of-prompt-free-few-shot-text-classification-methods/271717491 kevig@slideshare.net(kevig) A Review of Prompt-Free Few-Shot Text Classification Methods kevig Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc01-240911025129-9888452d-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Text-based comments play a crucial role in providing feedback for various industries. However, effectively filtering and categorizing this feedback based on custom context-specific criteria requires sophisticated language modeling techniques. While traditional approaches have shown effectiveness, they often require a substantial amount of data to compensate for their modeling deficiencies. In this work, we focus on highlighting the performance and limitations of prompt-free few-shot text classification using open-source pre-trained sentence transformers. On the one hand, our research includes a comprehensive study across different benchmark datasets, encompassing 9 dimensions such as sentiment analysis, topic modeling, grammatical acceptance, and emotion classification. Also, we worked at making different experiences to test Prompt-Free Few-Shot Text Classification. On the other hand, we underline prompt-free few-shot classification limitations when the targeted criteria are complex. As an alternative approach, prompting an instruction-fine-tuned language model has demonstrated favorable outcomes, as proven by our application in the specific use case of Identifying and extracting resolution results and actions from explanatory notes, achieving an accuracy rate of 80%.
A Review of Prompt-Free Few-Shot Text Classification Methods from kevig
]]>
14 0 https://cdn.slidesharecdn.com/ss_thumbnails/13424ijnlc01-240911025129-9888452d-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Current Issue: August 2024, Volume 13, Number 4 /slideshow/current-issue-august-2024-volume-13-number-4/271558388 welcome-240904122616-efd748d9
A Review of Prompt-Free Few-Shot Text Classification Methods Rim Messaoudi, Achraf Louiza and Francois Azelart, Akkodis Research, France Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc01.pdf Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh and Ajai Kumar, Centre for Development of Advanced Computing, India Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc02.pdf]]>

A Review of Prompt-Free Few-Shot Text Classification Methods Rim Messaoudi, Achraf Louiza and Francois Azelart, Akkodis Research, France Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc01.pdf Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh and Ajai Kumar, Centre for Development of Advanced Computing, India Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc02.pdf]]>
Wed, 04 Sep 2024 12:26:16 GMT /slideshow/current-issue-august-2024-volume-13-number-4/271558388 kevig@slideshare.net(kevig) Current Issue: August 2024, Volume 13, Number 4 kevig A Review of Prompt-Free Few-Shot Text Classification Methods Rim Messaoudi, Achraf Louiza and Francois Azelart, Akkodis Research, France Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc01.pdf Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh and Ajai Kumar, Centre for Development of Advanced Computing, India Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc02.pdf <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/welcome-240904122616-efd748d9-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> A Review of Prompt-Free Few-Shot Text Classification Methods Rim Messaoudi, Achraf Louiza and Francois Azelart, Akkodis Research, France Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc01.pdf Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English to Indian Language Machine Translation Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh and Ajai Kumar, Centre for Development of Advanced Computing, India Full Text: https://aircconline.com/ijnlc/V13N4/13424ijnlc02.pdf
Current Issue: August 2024, Volume 13, Number 4 from kevig
]]>
7 0 https://cdn.slidesharecdn.com/ss_thumbnails/welcome-240904122616-efd748d9-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
August 2024: Top 10 Downloaded Articles in Natural Language Computing /slideshow/august-2024-top-10-downloaded-articles-in-natural-language-computing/271380130 august2024-top10downloadarticlesinijnlc-240828085632-b1f3fc19
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers. ]]>

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers. ]]>
Wed, 28 Aug 2024 08:56:32 GMT /slideshow/august-2024-top-10-downloaded-articles-in-natural-language-computing/271380130 kevig@slideshare.net(kevig) August 2024: Top 10 Downloaded Articles in Natural Language Computing kevig Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/august2024-top10downloadarticlesinijnlc-240828085632-b1f3fc19-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
August 2024: Top 10 Downloaded Articles in Natural Language Computing from kevig
]]>
16 0 https://cdn.slidesharecdn.com/ss_thumbnails/august2024-top10downloadarticlesinijnlc-240828085632-b1f3fc19-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm /slideshow/enhanced-retrieval-of-web-pages-using-improved-page-rank-algorithm-ae06/271181157 2213ijnlc06-240821053859-7949013a
Information Retrieval (IR) is a very important and vast area. While searching for context web returns all the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank algorithm gives much better results than existing Googles Page Rank algorithm. To prove this we have calculated Reciprocal Rank for both the algorithms and presented comparative results. ]]>

Information Retrieval (IR) is a very important and vast area. While searching for context web returns all the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank algorithm gives much better results than existing Googles Page Rank algorithm. To prove this we have calculated Reciprocal Rank for both the algorithms and presented comparative results. ]]>
Wed, 21 Aug 2024 05:38:59 GMT /slideshow/enhanced-retrieval-of-web-pages-using-improved-page-rank-algorithm-ae06/271181157 kevig@slideshare.net(kevig) Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm kevig Information Retrieval (IR) is a very important and vast area. While searching for context web returns all the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank algorithm gives much better results than existing Googles Page Rank algorithm. To prove this we have calculated Reciprocal Rank for both the algorithms and presented comparative results. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc06-240821053859-7949013a-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Information Retrieval (IR) is a very important and vast area. While searching for context web returns all the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank algorithm gives much better results than existing Googles Page Rank algorithm. To prove this we have calculated Reciprocal Rank for both the algorithms and presented comparative results.
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm from kevig
]]>
4 0 https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc06-240821053859-7949013a-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Effect of MFCC Based Features for Speech Signal Alignments /slideshow/effect-of-mfcc-based-features-for-speech-signal-alignments-de41/271007731 2213ijnlc05-240814092306-3588e1e9
The fundamental techniques used for man-machine communication include Speech synthesis, speech recognition, and speech transformation. Feature extraction techniques provide a compressed representation of the speech signals. The HNM analyses and synthesis provides high quality speech with less number of parameters. Dynamic time warping is well known technique used for aligning two given multidimensional sequences. It locates an optimal match between the given sequences. The improvement in the alignment is estimated from the corresponding distances. The objective of this research is to investigate the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals in the form of twenty five phrases were recorded. The recorded material was segmented manually and aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has been seen that effective speech alignment can be carried out even at phrase level. ]]>

The fundamental techniques used for man-machine communication include Speech synthesis, speech recognition, and speech transformation. Feature extraction techniques provide a compressed representation of the speech signals. The HNM analyses and synthesis provides high quality speech with less number of parameters. Dynamic time warping is well known technique used for aligning two given multidimensional sequences. It locates an optimal match between the given sequences. The improvement in the alignment is estimated from the corresponding distances. The objective of this research is to investigate the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals in the form of twenty five phrases were recorded. The recorded material was segmented manually and aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has been seen that effective speech alignment can be carried out even at phrase level. ]]>
Wed, 14 Aug 2024 09:23:06 GMT /slideshow/effect-of-mfcc-based-features-for-speech-signal-alignments-de41/271007731 kevig@slideshare.net(kevig) Effect of MFCC Based Features for Speech Signal Alignments kevig The fundamental techniques used for man-machine communication include Speech synthesis, speech recognition, and speech transformation. Feature extraction techniques provide a compressed representation of the speech signals. The HNM analyses and synthesis provides high quality speech with less number of parameters. Dynamic time warping is well known technique used for aligning two given multidimensional sequences. It locates an optimal match between the given sequences. The improvement in the alignment is estimated from the corresponding distances. The objective of this research is to investigate the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals in the form of twenty five phrases were recorded. The recorded material was segmented manually and aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has been seen that effective speech alignment can be carried out even at phrase level. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc05-240814092306-3588e1e9-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The fundamental techniques used for man-machine communication include Speech synthesis, speech recognition, and speech transformation. Feature extraction techniques provide a compressed representation of the speech signals. The HNM analyses and synthesis provides high quality speech with less number of parameters. Dynamic time warping is well known technique used for aligning two given multidimensional sequences. It locates an optimal match between the given sequences. The improvement in the alignment is estimated from the corresponding distances. The objective of this research is to investigate the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals in the form of twenty five phrases were recorded. The recorded material was segmented manually and aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has been seen that effective speech alignment can be carried out even at phrase level.
Effect of MFCC Based Features for Speech Signal Alignments from kevig
]]>
4 0 https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc05-240814092306-3588e1e9-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model /slideshow/nerhmm-a-tool-for-named-entity-recognition-based-on-hidden-markov-model-5bec/270830257 2213ijnlc04-240807055640-e4a5e733
Named Entity Recognition (NER) is considered as one of the key task in the field of Information Retrieval. NER is the method of recognizing Named Entities (NEs) in a corpus and then organizing these NEs into diverse classes of NEs e.g. Name of Location, Person, Organization, Quantity, Time, Percentage etc. Today, there is a great need to develop a tool for NER, since the existing tools are of limited scope. In this paper, we would discuss the functionality and features of our tool of NER with some experimental results.]]>

Named Entity Recognition (NER) is considered as one of the key task in the field of Information Retrieval. NER is the method of recognizing Named Entities (NEs) in a corpus and then organizing these NEs into diverse classes of NEs e.g. Name of Location, Person, Organization, Quantity, Time, Percentage etc. Today, there is a great need to develop a tool for NER, since the existing tools are of limited scope. In this paper, we would discuss the functionality and features of our tool of NER with some experimental results.]]>
Wed, 07 Aug 2024 05:56:40 GMT /slideshow/nerhmm-a-tool-for-named-entity-recognition-based-on-hidden-markov-model-5bec/270830257 kevig@slideshare.net(kevig) NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model kevig Named Entity Recognition (NER) is considered as one of the key task in the field of Information Retrieval. NER is the method of recognizing Named Entities (NEs) in a corpus and then organizing these NEs into diverse classes of NEs e.g. Name of Location, Person, Organization, Quantity, Time, Percentage etc. Today, there is a great need to develop a tool for NER, since the existing tools are of limited scope. In this paper, we would discuss the functionality and features of our tool of NER with some experimental results. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc04-240807055640-e4a5e733-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Named Entity Recognition (NER) is considered as one of the key task in the field of Information Retrieval. NER is the method of recognizing Named Entities (NEs) in a corpus and then organizing these NEs into diverse classes of NEs e.g. Name of Location, Person, Organization, Quantity, Time, Percentage etc. Today, there is a great need to develop a tool for NER, since the existing tools are of limited scope. In this paper, we would discuss the functionality and features of our tool of NER with some experimental results.
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model from kevig
]]>
5 0 https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc04-240807055640-e4a5e733-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
July 2024: Top 10 Download Article in Natural Language Computing /slideshow/july-2024-top-10-download-article-in-natural-language-computing/270688138 july2024top10downloadarticleinnaturallanguagecomputing-240802064552-ab9a21cd
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.]]>

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.]]>
Fri, 02 Aug 2024 06:45:52 GMT /slideshow/july-2024-top-10-download-article-in-natural-language-computing/270688138 kevig@slideshare.net(kevig) July 2024: Top 10 Download Article in Natural Language Computing kevig Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/july2024top10downloadarticleinnaturallanguagecomputing-240802064552-ab9a21cd-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
July 2024: Top 10 Download Article in Natural Language Computing from kevig
]]>
14 0 https://cdn.slidesharecdn.com/ss_thumbnails/july2024top10downloadarticleinnaturallanguagecomputing-240802064552-ab9a21cd-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE /slideshow/nlization-of-nouns-pronouns-and-prepositions-in-punjabi-with-eugene-19b6/270636857 2213ijnlc03-240731091431-0542e6dc
Universal Networking Language (UNL) has been used by various researchers as an Interlingua approach for AMT (Automatic machine translation). The UNL system consists of two main components/tools, namely, EnConverter-IAN (used for converting the text from a source language to UNL) and DeConverter - EUGENE (used for converting the text from UNL to a target language). This paper highlights the DeConversion generation rules used for the DeConverter and indicates its usage in the generation of Punjabi sentences. This paper also covers the results of implementation of UNL input by using DeConverter-EUGENE and its evaluation on UNL sentences such as Nouns, Pronouns and Prepositions.]]>

Universal Networking Language (UNL) has been used by various researchers as an Interlingua approach for AMT (Automatic machine translation). The UNL system consists of two main components/tools, namely, EnConverter-IAN (used for converting the text from a source language to UNL) and DeConverter - EUGENE (used for converting the text from UNL to a target language). This paper highlights the DeConversion generation rules used for the DeConverter and indicates its usage in the generation of Punjabi sentences. This paper also covers the results of implementation of UNL input by using DeConverter-EUGENE and its evaluation on UNL sentences such as Nouns, Pronouns and Prepositions.]]>
Wed, 31 Jul 2024 09:14:31 GMT /slideshow/nlization-of-nouns-pronouns-and-prepositions-in-punjabi-with-eugene-19b6/270636857 kevig@slideshare.net(kevig) NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE kevig Universal Networking Language (UNL) has been used by various researchers as an Interlingua approach for AMT (Automatic machine translation). The UNL system consists of two main components/tools, namely, EnConverter-IAN (used for converting the text from a source language to UNL) and DeConverter - EUGENE (used for converting the text from UNL to a target language). This paper highlights the DeConversion generation rules used for the DeConverter and indicates its usage in the generation of Punjabi sentences. This paper also covers the results of implementation of UNL input by using DeConverter-EUGENE and its evaluation on UNL sentences such as Nouns, Pronouns and Prepositions. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc03-240731091431-0542e6dc-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Universal Networking Language (UNL) has been used by various researchers as an Interlingua approach for AMT (Automatic machine translation). The UNL system consists of two main components/tools, namely, EnConverter-IAN (used for converting the text from a source language to UNL) and DeConverter - EUGENE (used for converting the text from UNL to a target language). This paper highlights the DeConversion generation rules used for the DeConverter and indicates its usage in the generation of Punjabi sentences. This paper also covers the results of implementation of UNL input by using DeConverter-EUGENE and its evaluation on UNL sentences such as Nouns, Pronouns and Prepositions.
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE from kevig
]]>
8 0 https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc03-240731091431-0542e6dc-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Clustering Web Search Results for Effective Arabic Language Browsing /slideshow/clustering-web-search-results-for-effective-arabic-language-browsing-2840/270466020 2213ijnlc02-240724090012-553fa2d2
The process of browsing Search Results is one of the major problems with traditional Web search engines for English, European, and any other languages generally, and for Arabic Language particularly. This process is absolutely time consuming and the browsing style seems to be unattractive. Organizing Web search results into clusters facilitates users quick browsing through search results. Traditional clustering techniques (data-centric clustering algorithms) are inadequate since they don't generate clusters with highly readable names or cluster labels. To solve this problem, Description-centric algorithms such as Suffix Tree Clustering (STC) algorithm have been introduced and used successfully and extensively with different adapted versions for English, European, and Chinese Languages. However, till the day of writing this paper, in our knowledge, STC algorithm has been never applied for Arabic Web Snippets Search Results Clustering. In this paper, we propose first, to study how STC can be applied for Arabic Language? We then illustrate by example that is impossible to apply STC after Arabic Snippets pre-processing (stem or root extraction) because the Merging process yields many redundant clusters. Secondly, to overcome this problem, we propose to integrate STC in a new scheme taking into a count the Arabic language properties in order to get the web more and more adapted to Arabic users. The proposed approach automatically clusters the web search results into high quality, and high significant clusters labels. The obtained clusters not only are coherent, but also can convey the contents to the users concisely and accurately. Therefore the Arabic users can decide at a glance whether the contents of a cluster are of interest. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising to facilitate Arabic users quick browsing through Search Results. Finally, a recommended platform for Arabic Web Search Results Clustering is established based on Google search engine API.]]>

The process of browsing Search Results is one of the major problems with traditional Web search engines for English, European, and any other languages generally, and for Arabic Language particularly. This process is absolutely time consuming and the browsing style seems to be unattractive. Organizing Web search results into clusters facilitates users quick browsing through search results. Traditional clustering techniques (data-centric clustering algorithms) are inadequate since they don't generate clusters with highly readable names or cluster labels. To solve this problem, Description-centric algorithms such as Suffix Tree Clustering (STC) algorithm have been introduced and used successfully and extensively with different adapted versions for English, European, and Chinese Languages. However, till the day of writing this paper, in our knowledge, STC algorithm has been never applied for Arabic Web Snippets Search Results Clustering. In this paper, we propose first, to study how STC can be applied for Arabic Language? We then illustrate by example that is impossible to apply STC after Arabic Snippets pre-processing (stem or root extraction) because the Merging process yields many redundant clusters. Secondly, to overcome this problem, we propose to integrate STC in a new scheme taking into a count the Arabic language properties in order to get the web more and more adapted to Arabic users. The proposed approach automatically clusters the web search results into high quality, and high significant clusters labels. The obtained clusters not only are coherent, but also can convey the contents to the users concisely and accurately. Therefore the Arabic users can decide at a glance whether the contents of a cluster are of interest. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising to facilitate Arabic users quick browsing through Search Results. Finally, a recommended platform for Arabic Web Search Results Clustering is established based on Google search engine API.]]>
Wed, 24 Jul 2024 09:00:12 GMT /slideshow/clustering-web-search-results-for-effective-arabic-language-browsing-2840/270466020 kevig@slideshare.net(kevig) Clustering Web Search Results for Effective Arabic Language Browsing kevig The process of browsing Search Results is one of the major problems with traditional Web search engines for English, European, and any other languages generally, and for Arabic Language particularly. This process is absolutely time consuming and the browsing style seems to be unattractive. Organizing Web search results into clusters facilitates users quick browsing through search results. Traditional clustering techniques (data-centric clustering algorithms) are inadequate since they don't generate clusters with highly readable names or cluster labels. To solve this problem, Description-centric algorithms such as Suffix Tree Clustering (STC) algorithm have been introduced and used successfully and extensively with different adapted versions for English, European, and Chinese Languages. However, till the day of writing this paper, in our knowledge, STC algorithm has been never applied for Arabic Web Snippets Search Results Clustering. In this paper, we propose first, to study how STC can be applied for Arabic Language? We then illustrate by example that is impossible to apply STC after Arabic Snippets pre-processing (stem or root extraction) because the Merging process yields many redundant clusters. Secondly, to overcome this problem, we propose to integrate STC in a new scheme taking into a count the Arabic language properties in order to get the web more and more adapted to Arabic users. The proposed approach automatically clusters the web search results into high quality, and high significant clusters labels. The obtained clusters not only are coherent, but also can convey the contents to the users concisely and accurately. Therefore the Arabic users can decide at a glance whether the contents of a cluster are of interest. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising to facilitate Arabic users quick browsing through Search Results. Finally, a recommended platform for Arabic Web Search Results Clustering is established based on Google search engine API. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc02-240724090012-553fa2d2-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The process of browsing Search Results is one of the major problems with traditional Web search engines for English, European, and any other languages generally, and for Arabic Language particularly. This process is absolutely time consuming and the browsing style seems to be unattractive. Organizing Web search results into clusters facilitates users quick browsing through search results. Traditional clustering techniques (data-centric clustering algorithms) are inadequate since they don&#39;t generate clusters with highly readable names or cluster labels. To solve this problem, Description-centric algorithms such as Suffix Tree Clustering (STC) algorithm have been introduced and used successfully and extensively with different adapted versions for English, European, and Chinese Languages. However, till the day of writing this paper, in our knowledge, STC algorithm has been never applied for Arabic Web Snippets Search Results Clustering. In this paper, we propose first, to study how STC can be applied for Arabic Language? We then illustrate by example that is impossible to apply STC after Arabic Snippets pre-processing (stem or root extraction) because the Merging process yields many redundant clusters. Secondly, to overcome this problem, we propose to integrate STC in a new scheme taking into a count the Arabic language properties in order to get the web more and more adapted to Arabic users. The proposed approach automatically clusters the web search results into high quality, and high significant clusters labels. The obtained clusters not only are coherent, but also can convey the contents to the users concisely and accurately. Therefore the Arabic users can decide at a glance whether the contents of a cluster are of interest. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising to facilitate Arabic users quick browsing through Search Results. Finally, a recommended platform for Arabic Web Search Results Clustering is established based on Google search engine API.
Clustering Web Search Results for Effective Arabic Language Browsing from kevig
]]>
9 0 https://cdn.slidesharecdn.com/ss_thumbnails/2213ijnlc02-240724090012-553fa2d2-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-kevig-48x48.jpg?cb=1734512759 https://cdn.slidesharecdn.com/ss_thumbnails/11222ijnlc01-241218090624-179ed369-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/text-data-labelling-using-transformer-based-sentence-embeddings-and-text-similarity-for-text-classification-1759/274179415 TEXT DATA LABELLING US... https://cdn.slidesharecdn.com/ss_thumbnails/11622ijnlc03-241211065930-df2d9d0e-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/unscrambling-codes-from-hieroglyphs-to-market-news-72b0/273985528 UNSCRAMBLING CODES: FR... https://cdn.slidesharecdn.com/ss_thumbnails/11122ijnlc02-241204040225-f1371713-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/integration-of-phonotactic-features-for-language-identification-on-code-switched-speech/273822085 Integration of Phonota...