Order Sentences from Short to Long
Contents
According to current Covid-19 controlling situation in our country, my wife and daughter both have a great mind to go abroad.
Learning English and get ideal IELTS scores are a possible and realistic way to achieve this goal.
The first obstable seemed to recite sentences by heart. Unfortunately, it’s not.
When I scratched dozens of sentences from YouTube in an English and Chinese spaced format, I found my little princess can not read the first sentence in a while through my help.
Background
Fig 1: Original file: Not ordered
The first sentence seemed a bit long for a ten-year-old Chinese girl.
So I use Python to find the shortest sentence out, and put them to the upmost position.
Code
1# Mission: Filter the sentences with the fewest words to the upmost
2
3# Task 1: Input -- Read the md file by lines. ".md" format is a same way of ".txt".
4path = r"/Users/tangqiang/private/" # set path
5file_name = "IELTS900.md" # set file name
6with open(path+file_name) as f: # open IELTS900.md file
7 txt = f.readlines() # read this file and put the whole content to a list variable named "txt"
8
9# Task 2: Process --
10dry = [] # set an empty list variable to filter the blank lines
11for el in txt: # iterate all elments in txt
12 if len(el.strip()) == 0: # blank lines, like space, enter,etc... need not to be recorded
13 pass
14 else: # record ordinary words to "dry" list
15 dry.append(el.strip()) # strip() function used to delete the blank string before and after the word.
16
17
18# The function part should be put in front of the code, we put here to make it more apprehensible
19def is_contains_chinese(strs):
20 """
21 check all the characters in the string, if any one belongs to Chinese, return True.
22 """
23 for _char in strs:
24 if '\u4e00' <= _char <= '\u9fa5': #
25 return True
26 return False
27# The function part should be put in front of the code
28
29# Split English and Chinese into two parts
30res_en = [] # set English sentence result list
31res_cn = [] # set Chinese sentence result list
32for i, el in enumerate(dry):
33 if not is_contains_chinese(el): # if not contains Chinese
34 res_en.append(el.strip())
35 else:
36 res_cn.append(el.strip())
37
38dic = dict(zip(res_en, res_cn)) # put two lists into one dictionary, then specific sentence can be found easily.
39new_en = sorted(res_en, key=len) # sort the english sentence by length
40
41# Task 3: Output -- write the new ordered sentences to a new file
42for el in new_en:
43 with open(path + "ordered_" + file_name, "a") as f: # open a new file with "a" mode which means open then append new strings at the end of the file
44 f.write(f"{el}\n") # English sentence from short to long
45 f.write(f"{dic[el]}\n\n") # Chinese sentence corresponds to English ones
Effect
At last, we got what we need: from short sentence to long.
Fig 2: Ordered file from short to long