Evidence pointing to the existence of formulaic language has been found in a variety of research fields. Approaches to language teaching have in the last decade recognised the need to take such linguistic phenomena into account, the most comprehensive presentation to date being the taxonomy of "lexical phrases" by Nattinger and DeCarrico (1992). They suggested the lexical phrase as a pedagogically applicable formulaic sequence, and they specified lexical phrases to be used in the teaching of academic writing to learners of English as a foreign language. However, some of these lexical phrases did not intuitively seem typical of those used in academic writing in English, possibly due to the limitations of the linguistic data on which they were based. It was felt that corpus research may provide a clearer, less intuitive insight into these units and how their shape and use differs in different academic contexts. This paper thus presents a corpus-based examination of the formal and functional variation of one of these lexical phrases as it is used by writers in academic social science, medical, and technical disciplines. It attempts to highlight the relationship between the syntagmatic and paradigmatic variation of this lexical phrase and its discourse signalling and organising functions in different academic disciplines.
There has long been evidence from cognitive psychology, from studies of first and second language acquisition, and from textual description which suggests that speakers may possess a non-homogeneous store of language knowledge, consisting of a system of generative grammatical rules and a store of pre-assembled patterns, and that a speaker at times bypasses the rules and retrieves a pre-fabricated pattern instead. From a cognitive perspective Pawley and Syder argued that the majority of a speakers output is in some part memorised, and only "a minority of spoken clauses are entirely novel creations in the sense that the combination of lexical items is new to the speaker." (Pawley and Syder 1983: 205). Bolinger drew on the work of Van Lancker and suggested that lateralisation of functions in the cortex "points to a side which files things and a side which puts them together," (Bolinger 1976: 13), and that formulaic language is "part of the automatic or semi-automatic store which continues to be more or less automatic, even when passed through the analytical sieve that separates them." (ibid.: 13). From a psycholinguistic perspective, Peters (1983 cited in Weinert 1995: 181) found evidence of formulaic language in first language acquisition, and Hakuta (1974) saw it in data from child second language learning. More recently, Sinclair has held that speakers and writers can use "a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments." (Sinclair 1991: 110). For Sinclair this formulaic language operating under the idiom principle is "as least as important as grammar in the explanation of how meaning arises in text," (ibid.: 112). Wray (1999) has recently presented a thorough survey of formulaic language across these different fields of reference.
The above evidence suggests that formulaic language of some kind features widely in language learning and language use, and thus may be important for learners and users of second languages. Cowie, for example, holds that "the sheer density of ready-made units in various types of written text is a fact that any approach to the teaching of writing to foreign students has to come to terms with." (Cowie 1992: 10). Lewis, a influential force in raising awareness of formulaic language in English language teaching during the 1990s similarly contends that "chunking of written text principally involves words, word partnerships and, for those learning to write in a particular genre such as academic English, developing an awareness of the sentence heads and frames typical of the genre." (Lewis 1996: 15).
As already mentioned, the focus of the present study is the lexical phrase, a term which has been used frequently in the English language teaching literature over the past decade, not least due to the contribution made by Lewis. The term dates back at least to Becker (1975) although it is also used to refer to types of compound nouns in the field of Information Retrieval (c.f. Krovetz 1997).
Nattinger and DeCarrico used the term to describe a pedagogically applicable unit of formulaic language; one which had a categorical form and pragmatic discourse function. They specified lexical phrases for, among other uses, to help students with the organisation and form of their essays, based on the observation that:
"the typical essay a student writes in North American universities... adheres to the following structure:
| 1. Opening | A: Topic priming: sets the scene and prepares the reader for what is to follow. | ||
| B: Topic nomination |
a) statement of purpose: explains what the writer intends; |
||
|
C: Statement of organisation: explains how the writer will talk about the topic. |
|||
| 2. Body: sets forth the argument, conveys the information. | |||
|
3. Closing: brings the argument to a close." (ibid.: 164) |
|||
They then provide lists of "representative lexical phrases for the above categories." (ibid.: 165). However, this model of academic discourse organisation appears rather superficial. It might be asked how it is in this model that there are three complex stages for opening, yet only one each for body and closing? And to what degree are the specified phrases for teaching academic writing "representative" of academic English? D. Willis, for example speculates that "the phrases [Nattinger and DeCarrico] cite are salient according to the model they adopt, rather than frequent and typical of the data." (D. Willis 1995: 88). The data used by Nattinger and DeCarrico in formulating these lexical phrases is rather briefly described as "written discourse collected from a variety of textbooks for ESL, textbooks for academic courses, letters to the editor of various news publications, and personal correspondence." (Nattinger and DeCarrico 1992: xvi). There is a distinct lack of detail as to the quantity and attestedness of this data, and this created the question behind the present study. Are the lexical phrases as specified replicable in authentic corpus data? Put more exactly, do word strings with similar form and function to these lexical phrases occur in published academic prose? If they do how does their form and function vary in different disciplines? The lexical phrase chosen in the attempt to answer these questions is described next.
The object of the present study is the lexical phrase it is/has been (often) asserted/believed/noted that X. Nattinger and DeCarrico term this a "sentence builder" lexical phrase which provides the framework for whole sentences (ibid.: 42). This is a discontinuous and highly variable category, and this particular example has the potential to frame a long and complex sentence; a writer whose first language was not English would need a sound grasp of what variation is and is not permissible in order to do this successfully. It would seem that more paradigmatic variation is possible than specified, example verbs such as argue or claim could be used here. Similarly, syntagmatic variation would also seem possible: there is no reason why often need stay in the specified position after been. The function of this lexical phrase is intended to be a macro-organiser which "primes a topic", something defined for academic writing as the way the writer "sets the scene and prepares the reader for what is to follow." (ibid.: 164).
The corpus used in this study is a subset of the British National Corpus (BNC). The current BNC classification relies on a broad categorisation of texts by "domains" such as leisure, business, imaginative, and informative and so on, which makes it difficult to isolate from the corpus texts from different academic subject disciplines. A discussion on the nature of genre, register and text type would be appropriate here, but this contentious issue is beyond the scope of this paper. For the purpose of this study, files were taken from the BNC representing based on the "genre" categorisation by Lee (2000) comprising technical, social science and medical texts. A full list of the files used is available from the author on request.
Five discourse functions of these strings were identified for these occurrences which arguably qualifies them as lexical phrases.
In all three disciplines, strings of the form it is/has been ____ that X occur which have some kind of topic function in all disciplines, although it is not surprising that in published academic book extracts and articles it seems to be a more complex function than that proposed by Nattinger and DeCarrico for student essays. It can occur at higher or lower levels, to introduce or "prime" a topic or idea which controls a greater or lesser amount of text such as a chapter, section, or paragraph. The occurrences of word strings with a topic priming function will have been affected the incomplete nature of many of the texts in the corpus, but the occurrences do suggest that lexical phrases are used for a function of this nature, although with more paradigmatic variation allowed than simply asserted, believed, or noted. The phrases are much more likely to be varied adverbially when they are in the present simple.
This lexical phrase is a discourse device which links what the writer is current in the text with a relevant area later. In each discipline it is used more in the present perfect than in the present simple.
The most common lexical phrase function in all disciplines in both tenses is that of non-cited support. Writers bring in non-conflicting factual information or arguments from outside sources which are not attributed in any way. This flies in the face of citation practices commonly taught in academic writing programmes. The instances of show and suggest in medical texts all perform this function. When outside information is brought in to an attributed source, the "support cited" function, the phrases are used in equal balance between present simple and present perfect tense. The medical writers in this study are notable in that they very seldom use these lexical phrases to attribute outside sources.
The "straw man" function is when a writer wishes to introduce an argument which he or she intends to negatively evaluate, i.e. dispute. The strings used in this way occur mostly in medical texts, and hardly at all in technical writing. This function of the pattern is the first move in a debate; the writer will in the second move dispute the proposition contained in the first move. This first move thus acts as a "straw man" which the writer sets up in order to knock down in the second move. This corresponds to the attributive use of ARGUE identified by Hunston (1995) in a COBUILD corpus study of verbs of attribution which found that ARGUE + that is usually used in conflicts. The attributor (i.e. the writer of the principal text) uses it to state another authoritys proposition (the 1ST MOVE) and afterwards gives it his or her own evaluation (the 2ND MOVE) either positively (agreeing) or negatively (disputing). Hunston finds that "an attributed statement introduced by ARGUE, irrespective of context, is likely to be negatively evaluated." (ibid.: 153). However in these subcorpora, only twice when argue introduced a "straw man" was the writer of the proposition attributed by name.
The effect of tense, syntagmatic, and paradigmatic variation on function is complex and not easily pinned down in corpora of such different sizes. In this final section some illustrative examples will be described. There is some overlap between functions, for example in an abstract, it is argued that is at one stage introducing the reader to an idea contained in the paper, but also referring the reader to another part of the text. Similarly, it is can also be both a topic primer and a "straw man".
An clue as to the relationship between syntagmatic variation and pragmatic function of lexical phrases can be seen in the extracts below:
The empirical research carried out here is intended to test the feasibility of this approach, as well as providing some general indications of what the effects might be and of one way by which they can be represented on the final output map. In so doing it is recognized that many of the questions raised are left open for subsequent investigation and that the results are initial and tentative.
It is widely recognized that the proportion of women who suffer mental disorders - particularly depression - exceeds that of men (Cochrane, 1983).
In the first extract, it is the writer doing the recognising, i.e. recognising that more research is needed. In the second extract the recognition is of a fact recognised by the writer and the wider discourse community, and which the writer intends the reader also to recognise. During the categorisation process, the first example would have been discarded, and the second occurrence would have been classed as "support cited." In the present simple tense at least, the insertion of the adverb widely changes the verb sense and thus alters the function of the phrase.
A related phenomenon in strings of this type has been noted by Johns (1991) who observed that tense choice also affects function. He suggests that:
"in English science and engineering academic abstracts, the present perfect is specifically used to refer to the work of other scientists. For example It is proposed that ... suggests that the writer of the abstract is doing the proposing, but It has been proposed that ... suggests that the proposing is done by someone other than the writer." (Johns 1991 quoted in Baker 1992: 101)
This would appear to relate tense choice to function, with It is proposed that ... containing a performative verb, and the string thus not qualifying as a lexical phrase, while It has been proposed that ... performs a support function and thus is can be judged a lexical phrase.
It should be clear that if we accept in principle the definition of a lexical phrase, then the function assigned to this particular phrase needs broadening. In addition to topic priming, instances where this phrase has been used for attribution, positive and negatively attributed statements, and in-text referral have been found in all three disciplines.
Arnaud, P. L. J. and Bejoint, H. (eds) 1992. Vocabulary and Applied Linguistics. Basingstoke: Macmillan.
Baker, M. 1992. In other words : a coursebook on translation. London: Routledge.
Becker, J. 1975. "The Phrasal Lexicon." in Nash-Webber, B. and Schank, R. (eds)
Bolinger, D. 1976. "Meaning and Memory." Forum Linguisticum 1/1: 1-14.
Cowie, A. P. 1992. "Multiword Lexical Units and Communicative Language Teaching." in Anaud, P. J. L., Bejoint, H. (eds)
Hakuta, K. 1974 "Pre-fabricated Patterns and the Emergence of Structure in Second Language Acquisition." Language Learning 24/2
Hunston, S. 1995 "A corpus study of English verbs of attribution." Functions of Language 2/2
Krovetz, R. 1997. "Homonymy and polysemy in Information Retrieval" in Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics: 72-29.
Lee, D. 2000. "Navigating through the BNC jungle using genre." Paper presented at TaLC 2000, 4th International Conference on Teaching and Language Corpora. Karl-Franz University, Graz, Austria. July 19-23, 2000.
Lewis, M. 1996 "Implications of a lexical view of language." in Willis, J. and Willis, D. (eds) Challenge and change in language teaching. Oxford: Heinemann
Nash-Webber, B. and Schank, R. 1975. (eds) Theoretical Issues in Natural language Processing 1. Cambridge, Mass.: Bolt, Beranek, and Newman.
Nattinger, J. R. and DeCarrico, J. 1992. Lexical Phrases and Language Teaching. Oxford: Oxford University Press.
Pawley, A. and Syder, F. H. 1983. "Two puzzles for linguistic theory: nativelike selection and nativelike fluency." in Richards, J. C. and Schmidt, R. W. (eds).
Richards, J. C. and Schmidt, R. W. 1983. (eds). Language and Communication. London: Longman.
Sinclair, J. 1991. Corpus, Concordance, Collocation Oxford: Oxford University Press.
Weinert, R. 1995. "The Role of Formulaic Language in Second Language Acquisition." Applied Linguistics 16/2
Willis, D. 1995. "Review of Lexical Phrases and Language Teaching." English Language Teaching Journal 49/1: 87-90.
Willis, J. and Willis, D. (eds.) 1996 Challenge and change in language teaching. Oxford: Heinemann
Wray, A. 1999. "Formulaic language in learners and native speakers." Language Teaching 32: 213-231.