Updated Description file

gbganalyst · gbganalyst · commit feddb991bdd2 · 2023-11-16T18:19:28.000+01:00
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -11,27 +11,29 @@ Authors@R: c(
     person("Isaac", "Ajao", , "isaacoluwaseyiajao@gmail.com", role = "ctb")
   )
 Description: Designed to simplify and streamline the process of reading
-    and processing large volumes of data in R. With a collection of
-    functions tailored for bulk data operations, the package allows users
-    to efficiently read multiple sheets from 'Microsoft Excel'/'Google
-    Sheets' workbooks and multiple CSV files from a directory.  It returns
-    the data as organized data frames, making it convenient for further
-    analysis and manipulation. Whether dealing with extensive data sets or
-    batch processing tasks, 'bulkreadr' empowers users to effortlessly
-    handle data in bulk, saving time and effort in data preparation
-    workflows.
+    and processing large volumes of data in R, this package offers a
+    collection of functions tailored for bulk data operations. It enables
+    users to efficiently read multiple sheets from Microsoft Excel and
+    Google Sheets workbooks, as well as various CSV files from a
+    directory. The data is returned as organized data frames, facilitating
+    further analysis and manipulation. Ideal for handling extensive data
+    sets or batch processing tasks, bulkreadr empowers users to manage
+    data in bulk effortlessly, saving time and effort in data preparation
+    workflows. Additionally, the package seamlessly works with labelled
+    data from SPSS and Stata.
 License: MIT + file LICENSE
 URL: https://github.com/gbganalyst/bulkreadr
 BugReports: https://github.com/gbganalyst/bulkreadr/issues
 Depends:
     purrr
 Imports: 
-    dplyr,
     curl,
+    dplyr,
     fs,
     googlesheets4,
     haven,
     inspectdf,
+    labelled,
     lubridate,
     magrittr,
     openxlsx,
@@ -40,8 +42,7 @@ Imports:
     sjlabelled,
     stats,
     stringr,
-    tibble,
-    labelled
+    tibble
 Suggests:
     knitr,
     rmarkdown,
diff --git a/README.Rmd b/README.Rmd
@@ -81,6 +81,10 @@ This section provides a concise overview of the different functions available in
 
 ## Other functions in `bulkreadr` package:
 
+- [`generate_dictionary`](#generate_dictionary)
+
+- [`look_for`](#look_for)
+
 - [`pull_out()`](#pull_out)
 
 - [`convert_to_date()`](#convert_to_date)
@@ -213,6 +217,38 @@ data
 
 ```
 
+
+## `generate_dictionary()`
+
+`generate_dictionary()` creates a data dictionary from a specified data frame. This function is particularly useful for understanding and documenting the structure of your dataset, similar to data dictionaries in Stata or SPSS.
+
+```{r}
+
+# Creating a data dictionary from an SPSS file
+
+file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
+
+wage_data <- read_spss_data(file = file_path)
+
+generate_dictionary(wage_data)
+```
+
+
+## `look_for()`
+
+The `look_for()` function is designed to emulate the functionality of the Stata `lookfor` command in R. It provides a powerful tool for searching through large datasets, specifically targeting variable names, variable label descriptions, factor levels, and value labels. This function is handy for users working with extensive and complex datasets, enabling them to quickly and efficiently locate the variables of interest.
+
+
+```{r}
+
+# Look for a single keyword.
+
+look_for(wage_data, "south")
+
+look_for(wage_data, "e")
+```
+
+
 ## `pull_out()` 
 
 `pull_out()` is similar to `[`. It acts on vectors, matrices, arrays and lists to extract or replace parts. It is pleasant to use with the magrittr (`⁠%>%`⁠) and base(`|>`) operators.
diff --git a/README.md b/README.md
@@ -77,6 +77,10 @@ purposes and are designed to handle importing of data in bulk.
 
 ## Other functions in `bulkreadr` package:
 
+- [`generate_dictionary`](#generate_dictionary)
+
+- [`look_for`](#look_for)
+
 - [`pull_out()`](#pull_out)
 
 - [`convert_to_date()`](#convert_to_date)
@@ -292,6 +296,99 @@ data
 #> #   `Highest education level` <fct>
 ```
 
+## `generate_dictionary()`
+
+`generate_dictionary()` creates a data dictionary from a specified data
+frame. This function is particularly useful for understanding and
+documenting the structure of your dataset, similar to data dictionaries
+in Stata or SPSS.
+
+``` r
+
+# Creating a data dictionary from an SPSS file
+
+file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
+
+wage_data <- read_spss_data(file = file_path)
+
+generate_dictionary(wage_data)
+#> # A tibble: 9 × 6
+#>   position variable description                     `column type` missing levels
+#>      <int> <chr>    <chr>                           <chr>           <int> <name>
+#> 1        1 id       Worker ID                       dbl                 0 <NULL>
+#> 2        2 educ     Number of years of education    dbl                 0 <NULL>
+#> 3        3 south    Live in south                   fct                 0 <chr> 
+#> 4        4 sex      Gender                          fct                 0 <chr> 
+#> 5        5 exper    Number of years of work experi… dbl                 0 <NULL>
+#> # ℹ 4 more rows
+```
+
+## `look_for()`
+
+The `look_for()` function is designed to emulate the functionality of
+the Stata `lookfor` command in R. It provides a powerful tool for
+searching through large datasets, specifically targeting variable names,
+variable label descriptions, factor levels, and value labels. This
+function is handy for users working with extensive and complex datasets,
+enabling them to quickly and efficiently locate the variables of
+interest.
+
+``` r
+
+# Look for a single keyword.
+
+look_for(wage_data, "south")
+#>  pos variable label         col_type missing values                
+#>  3   south    Live in south fct      0       does not live in South
+#>                                              lives in South
+
+look_for(wage_data, "e")
+#>  pos variable label                              col_type missing
+#>  1   id       Worker ID                          dbl      0      
+#>  2   educ     Number of years of education       dbl      0      
+#>  3   south    Live in south                      fct      0      
+#>                                                                  
+#>  4   sex      Gender                             fct      0      
+#>                                                                  
+#>  5   exper    Number of years of work experience dbl      0      
+#>  6   wage     Wage (dollars per hour)            dbl      0      
+#>  7   occup    Occupation                         fct      0      
+#>                                                                  
+#>                                                                  
+#>                                                                  
+#>                                                                  
+#>                                                                  
+#>  8   marr     Marital status                     fct      0      
+#>                                                                  
+#>  9   ed       Highest education level            fct      0      
+#>                                                                  
+#>                                                                  
+#>                                                                  
+#>                                                                  
+#>  values                
+#>                        
+#>                        
+#>  does not live in South
+#>  lives in South        
+#>  Male                  
+#>  Female                
+#>                        
+#>                        
+#>  Management            
+#>  Sales                 
+#>  Clerical              
+#>  Service               
+#>  Professional          
+#>  Other                 
+#>  Not married           
+#>  Married               
+#>  Less than h.s. degree 
+#>  High school degree    
+#>  Some college          
+#>  College degree        
+#>  Graduate school
+```
+
 ## `pull_out()`
 
 `pull_out()` is similar to `[`. It acts on vectors, matrices, arrays and
@@ -340,7 +437,7 @@ convert_to_date(dates)
 # It can also convert date time object to date object 
 
 convert_to_date(lubridate::now())
-#> [1] "2023-09-20"
+#> [1] "2023-11-16"
 ```
 
 ## `inspect_na()`
diff --git a/cran-comments.md b/cran-comments.md
@@ -1,10 +1,10 @@
 ## New version
 
-This is a new version submission. In this version we:
+This is a new version submission. In this version we developed two new functions namely:
 
-- Developed `read_stata_data()` to import Stata data file (`.dta`) into an R data frame, converting labeled variables into factors.
+- `generate_dictionary()`: This function is designed to automatically create a comprehensive data dictionary from labelled datasets. The generated dictionary provides detailed insights into each variable, aiding in better data understanding and management.
 
-- Reduced dependency packages to optimize efficiency.
+- `look_for()`: This enhances the capability to efficiently search within labelled datasets. It allows users to quickly find variable names and their descriptions by searching for specific keywords. This feature streamlines data exploration and analysis, particularly in large datasets with extensive variables.
 
 ## R CMD check results
 
diff --git a/man/bulkreadr-package.Rd b/man/bulkreadr-package.Rd
diff --git a/vignettes/bulkreadr.Rmd b/vignettes/bulkreadr.Rmd
@@ -78,6 +78,10 @@ This section provides a concise overview of the different functions available in
 
 ## Other functions in bulkreadr package:
 
+- [`generate_dictionary`](#generate_dictionary)
+
+- [`look_for`](#look_for)
+
 - [`pull_out()`](#pull_out)
 
 - [`convert_to_date()`](#convert_to_date)
@@ -211,6 +215,37 @@ data
 
 ```
 
+
+## generate_dictionary()
+
+`generate_dictionary()` creates a data dictionary from a specified data frame. This function is particularly useful for understanding and documenting the structure of your dataset, similar to data dictionaries in Stata or SPSS.
+
+```{r}
+
+# Creating a data dictionary from an SPSS file
+
+file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
+
+wage_data <- read_spss_data(file = file_path)
+
+generate_dictionary(wage_data)
+```
+
+
+## look_for()
+
+The `look_for()` function is designed to emulate the functionality of the Stata `lookfor` command in R. It provides a powerful tool for searching through large datasets, specifically targeting variable names, variable label descriptions, factor levels, and value labels. This function is handy for users working with extensive and complex datasets, enabling them to quickly and efficiently locate the variables of interest.
+
+
+```{r}
+
+# Look for a single keyword.
+
+look_for(wage_data, "south")
+
+look_for(wage_data, "e")
+```
+
 ## pull_out()
 
 `pull_out()` is similar to [. It acts on vectors, matrices, arrays and lists to extract or replace parts. It is pleasant to use with the magrittr (`⁠%>%`⁠) and base(`|>`) operators.