pyspark.pandas.Series.str.contains#
- str.contains(pat, case=True, flags=0, na=None, regex=True)#
- Test if pattern or regex is contained within a string of a Series. - Return boolean Series based on whether a given pattern or regex is contained within a string of a Series. - Analogous to - match(), but less strict, relying on- re.search()instead of- re.match().- Parameters
- patstr
- Character sequence or regular expression. 
- casebool, default True
- If True, case sensitive. 
- flagsint, default 0 (no flags)
- Flags to pass through to the re module, e.g. re.IGNORECASE. 
- nadefault None
- Fill value for missing values. NaN converted to None. 
- regexbool, default True
- If True, assumes the pat is a regular expression. If False, treats the pat as a literal string. 
 
- Returns
- Series of boolean values or object
- A Series of boolean values indicating whether the given pattern is contained within the string of each element of the Series. 
 
 - Examples - Returning a Series of booleans using only a literal pattern. - >>> s1 = ps.Series(['Mouse', 'dog', 'house and parrot', '23', np.nan]) >>> s1.str.contains('og', regex=False) 0 False 1 True 2 False 3 False 4 None dtype: object - Specifying case sensitivity using case. - >>> s1.str.contains('oG', case=True, regex=True) 0 False 1 False 2 False 3 False 4 None dtype: object - Specifying na to be False instead of NaN replaces NaN values with False. If Series does not contain NaN values the resultant dtype will be bool, otherwise, an object dtype. - >>> s1.str.contains('og', na=False, regex=True) 0 False 1 True 2 False 3 False 4 False dtype: bool - Returning ‘house’ or ‘dog’ when either expression occurs in a string. - >>> s1.str.contains('house|dog', regex=True) 0 False 1 True 2 True 3 False 4 None dtype: object - Ignoring case sensitivity using flags with regex. - >>> import re >>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True) 0 False 1 False 2 True 3 False 4 None dtype: object - Returning any digit using regular expression. - >>> s1.str.contains('[0-9]', regex=True) 0 False 1 False 2 False 3 True 4 None dtype: object - Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0. - >>> s2 = ps.Series(['40','40.0','41','41.0','35']) >>> s2.str.contains('.0', regex=True) 0 True 1 True 2 False 3 True 4 False dtype: bool