Skip to content

Commit feddb99

Browse files
committed
Updated Description file
1 parent e8e8ffa commit feddb99

File tree

6 files changed

+186
-17
lines changed

6 files changed

+186
-17
lines changed

DESCRIPTION

+13-12
Original file line numberDiff line numberDiff line change
@@ -11,27 +11,29 @@ Authors@R: c(
1111
person("Isaac", "Ajao", , "[email protected]", role = "ctb")
1212
)
1313
Description: Designed to simplify and streamline the process of reading
14-
and processing large volumes of data in R. With a collection of
15-
functions tailored for bulk data operations, the package allows users
16-
to efficiently read multiple sheets from 'Microsoft Excel'/'Google
17-
Sheets' workbooks and multiple CSV files from a directory. It returns
18-
the data as organized data frames, making it convenient for further
19-
analysis and manipulation. Whether dealing with extensive data sets or
20-
batch processing tasks, 'bulkreadr' empowers users to effortlessly
21-
handle data in bulk, saving time and effort in data preparation
22-
workflows.
14+
and processing large volumes of data in R, this package offers a
15+
collection of functions tailored for bulk data operations. It enables
16+
users to efficiently read multiple sheets from Microsoft Excel and
17+
Google Sheets workbooks, as well as various CSV files from a
18+
directory. The data is returned as organized data frames, facilitating
19+
further analysis and manipulation. Ideal for handling extensive data
20+
sets or batch processing tasks, bulkreadr empowers users to manage
21+
data in bulk effortlessly, saving time and effort in data preparation
22+
workflows. Additionally, the package seamlessly works with labelled
23+
data from SPSS and Stata.
2324
License: MIT + file LICENSE
2425
URL: https://github.com/gbganalyst/bulkreadr
2526
BugReports: https://github.com/gbganalyst/bulkreadr/issues
2627
Depends:
2728
purrr
2829
Imports:
29-
dplyr,
3030
curl,
31+
dplyr,
3132
fs,
3233
googlesheets4,
3334
haven,
3435
inspectdf,
36+
labelled,
3537
lubridate,
3638
magrittr,
3739
openxlsx,
@@ -40,8 +42,7 @@ Imports:
4042
sjlabelled,
4143
stats,
4244
stringr,
43-
tibble,
44-
labelled
45+
tibble
4546
Suggests:
4647
knitr,
4748
rmarkdown,

README.Rmd

+36
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@ This section provides a concise overview of the different functions available in
8181

8282
## Other functions in `bulkreadr` package:
8383

84+
- [`generate_dictionary`](#generate_dictionary)
85+
86+
- [`look_for`](#look_for)
87+
8488
- [`pull_out()`](#pull_out)
8589

8690
- [`convert_to_date()`](#convert_to_date)
@@ -213,6 +217,38 @@ data
213217
214218
```
215219

220+
221+
## `generate_dictionary()`
222+
223+
`generate_dictionary()` creates a data dictionary from a specified data frame. This function is particularly useful for understanding and documenting the structure of your dataset, similar to data dictionaries in Stata or SPSS.
224+
225+
```{r}
226+
227+
# Creating a data dictionary from an SPSS file
228+
229+
file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
230+
231+
wage_data <- read_spss_data(file = file_path)
232+
233+
generate_dictionary(wage_data)
234+
```
235+
236+
237+
## `look_for()`
238+
239+
The `look_for()` function is designed to emulate the functionality of the Stata `lookfor` command in R. It provides a powerful tool for searching through large datasets, specifically targeting variable names, variable label descriptions, factor levels, and value labels. This function is handy for users working with extensive and complex datasets, enabling them to quickly and efficiently locate the variables of interest.
240+
241+
242+
```{r}
243+
244+
# Look for a single keyword.
245+
246+
look_for(wage_data, "south")
247+
248+
look_for(wage_data, "e")
249+
```
250+
251+
216252
## `pull_out()`
217253

218254
`pull_out()` is similar to `[`. It acts on vectors, matrices, arrays and lists to extract or replace parts. It is pleasant to use with the magrittr (`⁠%>%`⁠) and base(`|>`) operators.

README.md

+98-1
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,10 @@ purposes and are designed to handle importing of data in bulk.
7777

7878
## Other functions in `bulkreadr` package:
7979

80+
- [`generate_dictionary`](#generate_dictionary)
81+
82+
- [`look_for`](#look_for)
83+
8084
- [`pull_out()`](#pull_out)
8185

8286
- [`convert_to_date()`](#convert_to_date)
@@ -292,6 +296,99 @@ data
292296
#> # `Highest education level` <fct>
293297
```
294298

299+
## `generate_dictionary()`
300+
301+
`generate_dictionary()` creates a data dictionary from a specified data
302+
frame. This function is particularly useful for understanding and
303+
documenting the structure of your dataset, similar to data dictionaries
304+
in Stata or SPSS.
305+
306+
``` r
307+
308+
# Creating a data dictionary from an SPSS file
309+
310+
file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
311+
312+
wage_data <- read_spss_data(file = file_path)
313+
314+
generate_dictionary(wage_data)
315+
#> # A tibble: 9 × 6
316+
#> position variable description `column type` missing levels
317+
#> <int> <chr> <chr> <chr> <int> <name>
318+
#> 1 1 id Worker ID dbl 0 <NULL>
319+
#> 2 2 educ Number of years of education dbl 0 <NULL>
320+
#> 3 3 south Live in south fct 0 <chr>
321+
#> 4 4 sex Gender fct 0 <chr>
322+
#> 5 5 exper Number of years of work experi… dbl 0 <NULL>
323+
#> # ℹ 4 more rows
324+
```
325+
326+
## `look_for()`
327+
328+
The `look_for()` function is designed to emulate the functionality of
329+
the Stata `lookfor` command in R. It provides a powerful tool for
330+
searching through large datasets, specifically targeting variable names,
331+
variable label descriptions, factor levels, and value labels. This
332+
function is handy for users working with extensive and complex datasets,
333+
enabling them to quickly and efficiently locate the variables of
334+
interest.
335+
336+
``` r
337+
338+
# Look for a single keyword.
339+
340+
look_for(wage_data, "south")
341+
#> pos variable label col_type missing values
342+
#> 3 south Live in south fct 0 does not live in South
343+
#> lives in South
344+
345+
look_for(wage_data, "e")
346+
#> pos variable label col_type missing
347+
#> 1 id Worker ID dbl 0
348+
#> 2 educ Number of years of education dbl 0
349+
#> 3 south Live in south fct 0
350+
#>
351+
#> 4 sex Gender fct 0
352+
#>
353+
#> 5 exper Number of years of work experience dbl 0
354+
#> 6 wage Wage (dollars per hour) dbl 0
355+
#> 7 occup Occupation fct 0
356+
#>
357+
#>
358+
#>
359+
#>
360+
#>
361+
#> 8 marr Marital status fct 0
362+
#>
363+
#> 9 ed Highest education level fct 0
364+
#>
365+
#>
366+
#>
367+
#>
368+
#> values
369+
#>
370+
#>
371+
#> does not live in South
372+
#> lives in South
373+
#> Male
374+
#> Female
375+
#>
376+
#>
377+
#> Management
378+
#> Sales
379+
#> Clerical
380+
#> Service
381+
#> Professional
382+
#> Other
383+
#> Not married
384+
#> Married
385+
#> Less than h.s. degree
386+
#> High school degree
387+
#> Some college
388+
#> College degree
389+
#> Graduate school
390+
```
391+
295392
## `pull_out()`
296393

297394
`pull_out()` is similar to `[`. It acts on vectors, matrices, arrays and
@@ -340,7 +437,7 @@ convert_to_date(dates)
340437
# It can also convert date time object to date object
341438

342439
convert_to_date(lubridate::now())
343-
#> [1] "2023-09-20"
440+
#> [1] "2023-11-16"
344441
```
345442

346443
## `inspect_na()`

cran-comments.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
## New version
22

3-
This is a new version submission. In this version we:
3+
This is a new version submission. In this version we developed two new functions namely:
44

5-
- Developed `read_stata_data()` to import Stata data file (`.dta`) into an R data frame, converting labeled variables into factors.
5+
- `generate_dictionary()`: This function is designed to automatically create a comprehensive data dictionary from labelled datasets. The generated dictionary provides detailed insights into each variable, aiding in better data understanding and management.
66

7-
- Reduced dependency packages to optimize efficiency.
7+
- `look_for()`: This enhances the capability to efficiently search within labelled datasets. It allows users to quickly find variable names and their descriptions by searching for specific keywords. This feature streamlines data exploration and analysis, particularly in large datasets with extensive variables.
88

99
## R CMD check results
1010

man/bulkreadr-package.Rd

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/bulkreadr.Rmd

+35
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,10 @@ This section provides a concise overview of the different functions available in
7878

7979
## Other functions in bulkreadr package:
8080

81+
- [`generate_dictionary`](#generate_dictionary)
82+
83+
- [`look_for`](#look_for)
84+
8185
- [`pull_out()`](#pull_out)
8286

8387
- [`convert_to_date()`](#convert_to_date)
@@ -211,6 +215,37 @@ data
211215
212216
```
213217

218+
219+
## generate_dictionary()
220+
221+
`generate_dictionary()` creates a data dictionary from a specified data frame. This function is particularly useful for understanding and documenting the structure of your dataset, similar to data dictionaries in Stata or SPSS.
222+
223+
```{r}
224+
225+
# Creating a data dictionary from an SPSS file
226+
227+
file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
228+
229+
wage_data <- read_spss_data(file = file_path)
230+
231+
generate_dictionary(wage_data)
232+
```
233+
234+
235+
## look_for()
236+
237+
The `look_for()` function is designed to emulate the functionality of the Stata `lookfor` command in R. It provides a powerful tool for searching through large datasets, specifically targeting variable names, variable label descriptions, factor levels, and value labels. This function is handy for users working with extensive and complex datasets, enabling them to quickly and efficiently locate the variables of interest.
238+
239+
240+
```{r}
241+
242+
# Look for a single keyword.
243+
244+
look_for(wage_data, "south")
245+
246+
look_for(wage_data, "e")
247+
```
248+
214249
## pull_out()
215250

216251
`pull_out()` is similar to [. It acts on vectors, matrices, arrays and lists to extract or replace parts. It is pleasant to use with the magrittr (`⁠%>%`⁠) and base(`|>`) operators.

0 commit comments

Comments
 (0)