assignOps {base} | R Documentation |
Assignment Operators
Description.
Assign a value to a name.
a variable name (possibly quoted). | |
a value to be assigned to . |
There are three different assignment operators: two of them have leftwards and rightwards forms.
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R . See ‘The R Language Definition’ manual for further details and examples.
In all the assignment operator expressions, x can be a name or an expression defining a part of an object to be replaced (e.g., z[[1]] ). A syntactic name does not need to be quoted, though it can be (preferably by backtick s).
The leftwards forms of assignment <- = <<- group right to left, the other from left to right.
value . Thus one can use a <- b <- c <- 6 .
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language . Wadsworth & Brooks/Cole.
Chambers, J. M. (1998) Programming with Data. A Guide to the S Language . Springer (for = ).
assign (and its inverse get ), for “subassignment” such as x[i] <- v , see [<- ; further, environment .
Variable Assignment in R
In R we operate with variables. A variable can be seen as a container for a value. To get a better conceptual understanding of this, you can go through the following and code-along in your own R -session.
Assigning a Value to a Variable
- In R , we state values directly in the chunk or the console, e.g.:
Here, we just state 3 , so R simply “throws” that right back at you!
Now, if want to “catch” that 3 we have to assign it to a variable, e.g.:
- Notice how now we “catch” the 3 and nothing is “thrown” back to you, because we now have the 3 stored in x :
Updating the Value of a Variable
- Now, we can of course use x moving forward, e.g. by adding 2 :
- Notice how this does not change x and the result is simply “thrown” right-back-at-ya
- If we wanted to update x by adding 2 , we would have to “catch” the result as before:
- Now, we have updated x :
Use one Variable in the Creation of Another
- Analogue, we can create a new variable using x:
- Again, this does not change x
- But rather the result is now stored in y
- In R , we use the assignment operator <- to perform assignment
- Variables are not change in place, but needs to be stored
- Note, this also applies to running e.g. a dplyr -pipeline, where we do not change the dataset by running the pipeline, but we must store the result of the pipeline
Before continuing, make sure that you are on track with the above concepts!
- Create a new variable my_age containing… You guessed it!
- Add 0.5 to the variable (I.e. your age, when you’re done with this course)
- Check the value of my_age , did you remember to assign, thereby updating?
Pipeline Example
- Let us create some example sequence data:
- Notice, that our data creation is just “thrown” back at us, we forgot something!
- Now, we have stored the data in the variable my_dna_data
Note here, that a variable can as we saw before with x and y store a single value, e.g. 2 , but here, we are storing a tibble -object in the variable my_dna_data and in that tibble -object, we have a variable sequence , which contains some randomly generated dna.
But what if we wanted to add a new variable to the tibble -object, which is the lenght of each of the dna-sequences?
Nice! Let’s see that data again then:
Wait! What? Where is the variable we literally just created?
We forgot something… We did not update the my_dna_data , let’s fix that:
- Note, nothing is “trown” back at us! Let’s verify, that we did indeed update the my_dna_data :
Did it make sense? Check yourself, add a new variable to my_dna_data called sequence_capital by using the function str_to_upper()
That’s it - Hope it helped and remember… Bio data science in R is really fun!
Life With Data
- by bprasad26
assign() Function in R
assign() is a function in R that allows us to create a new variable and assign it a value or assign a new value to an existing variable. While this might seem straightforward and similar to the traditional assignment operator <- or = , assign() is more flexible as it allows dynamic assignment of variable names.In this article, we will explore the assign() function in depth, discussing its syntax, parameters, usage, and illustrating how to use it in different scenarios.
Understanding the assign() Function
The basic syntax for the assign() function is as follows:
The function includes the following parameters:
- x : This is a character vector of length one specifying the name of the variable to which we assign a value.
- value : The value to be assigned.
- pos : The search path position to use, defaults to -1 .
- envir : An alternative way to specify an environment, but usually not used because pos = -1 refers to the current environment.
- inherits : Should the assignment be made in the parent environment if an existing variable with the name x is found there? The default is FALSE .
- immediate : Should the assignment be made immediately? If FALSE , the variable x will be watched and assigned the value only when it is evaluated. This parameter is rarely used and typically left to the default of TRUE .
The most commonly used parameters are x and value . The others are less commonly used and, in many situations, can be left to their default values.
Using the assign() Function
The most basic use of the assign() function involves assigning a value to a new variable. Here’s an example:
In this code snippet, assign("x", 10) creates a new variable x and assigns it the value 10 . When we print(x) , the output will be 10 .You can check that x has indeed been assigned to your environment using the ls() function:
This will print all the variables in your current environment, including x .
Using assign() in Different Scenarios
Dynamic variable names.
One of the significant advantages of assign() is the ability to create dynamic variable names. For instance, suppose you’re running a simulation and need to store the results of each iteration in a separately named variable. Here’s how you can do this using assign() :
In this example, we create five variables ( result_1 through result_5 ), each storing 100 random numbers from a standard normal distribution.
Working with Data Frames
assign() can be particularly useful when working with data frames. For example, suppose you have a list of data frames and you want to assign each data frame to a separate variable. Here’s how you can do it:
In this example, we create two variables ( df_1 and df_2 ), each storing a separate data frame from df_list .
Within Functions
The assign() function can be particularly useful when used within other functions. For instance, you might want to write a function that creates new variables based on its input:
In this example, create_var() is a function that takes a variable name and a value, and it creates a new variable with that name and value in the global environment. This demonstrates how assign() can be used to modify the global environment from within a function.
Precautions
While assign() can be handy, it’s important to use it carefully and sparingly. Because it allows the creation of variables with dynamic names, it can make code harder to debug and understand, as the variables in your environment can change unexpectedly. In most cases, standard assignment ( <- or = ) or working with lists or data frames will be clearer and more straightforward.
The assign() function in R is a powerful tool for creating and assigning variables, especially when you need to create variables with dynamic names or from within functions. However, because of its power and flexibility, it should be used with care and only when necessary to keep your code easy to understand and debug.
Share this:
Leave a reply cancel reply, discover more from life with data.
Subscribe now to keep reading and get access to the full archive.
Type your email…
Continue reading
Assignment Operators in R
R provides two operators for assignment: <- and = .
Understanding their proper use is crucial for writing clear and readable R code.
Using the <- Operator
For assignments.
The <- operator is the preferred choice for assigning values to variables in R.
It clearly distinguishes assignment from argument specification in function calls.
Readability and Tradition
- This usage aligns with R’s tradition and enhances code readability.
Using the = Operator
The = operator is commonly used to explicitly specify named arguments in function calls.
It helps in distinguishing argument assignment from variable assignment.
Assignment Capability
- While = can also be used for assignment, this practice is less common and not recommended for clarity.
Mixing Up Operators
Potential confusion.
Using = for general assignments can lead to confusion, especially when reading or debugging code.
Mixing operators inconsistently can obscure the distinction between assignment and function argument specification.
- In the example above, x = 10 might be mistaken for a function argument rather than an assignment.
Best Practices Recap
Consistency and clarity.
Use <- for variable assignments to maintain consistency and clarity.
Reserve = for specifying named arguments in function calls.
Avoiding Common Mistakes
Be mindful of the context in which you use each operator to prevent misunderstandings.
Consistently using the operators as recommended helps make your code more readable and maintainable.
Quiz: Assignment Operator Best Practices
Which of the following examples demonstrates the recommended use of assignment operators in R?
- my_var = 5; mean(x = my_var)
- my_var <- 5; mean(x <- my_var)
- my_var <- 5; mean(x = my_var)
- my_var = 5; mean(x <- my_var)
- The correct answer is 3 . my_var <- 5; mean(x = my_var) correctly uses <- for variable assignment and = for specifying a named argument in a function call.
ProgrammingR
Beginner to advanced resources for the R programming language
- Search for:
Assign in r – How to use R’s assign function to assign values to a variable name
Sometimes in programming, it is helpful to have alternative ways of doing things. Not only can this provide for some variety in your code, but sometimes it can also make it more readable. It turns out that the R programming language has three separate ways to assign values to a variable. Knowing all three will help you to be able to read any R program that you find.
Description
R programming has three ways of assigning values to a variable. They are =, <-, and the assign function. The assign function has the basic format of assign(“variable”, value) where “variable” is the name of the variable receiving the values, note that it needs to be in quotes, and “value” is the value being assigned to that variable. While under most circumstances they work the same, they do have differences. The assign function has the advantage that it will show up in a text search of the program when you are looking for assign, while the two operators will not show up in such a search. In practice it differs little from the assign operators of = and <-, but it does resemble similar functions from other languages.
Explanation
The assign function has two required arguments, two are optional arguments you are likely to use, and two arguments that exist to maintain compatibility. The two required arguments are the variable and the value being assigned to it. The two optional arguments are the “pos” which indicates the environment the object is assigned to and ‘inherits” is a Boolean that indicates whether or not to restrict the object to the current environment. The two compatibility arguments are “envir” which selects the environment of the object and “immediate” which is unused and exists only for compatibility reasons. The assign function takes the variable name and assigns the indicated value to it within the specified environmental conditions. These arguments give the assign function more flexibility than the assign operators, making it a particularly useful function under the right conditions.
Here we have four code examples that show different situations where the assign function is being used. It needs to be noted that this function can be used with any values that the assign operators can use.
> assign(“x”, 17) > x [1] 17
In this example, we illustrate the most basic use of the assign function. In this case, we are simply assigning a value to a single variable.
> assign(“x”, c(“a”, “b”, “c”, “d”, “e”, “f”, “g”)) > x [1] “a” “b” “c” “d” “e” “f” “g”
In this example, we illustrate the creation of a vector using the combine function.
> assign(“x”, c(“a”, “b”, “c”, “d”, “e”, “f”, “g”)) > assign(“y”, 1:7) > assign(“df”, data.frame(x,y)) > df x y 1 a 1 2 b 2 3 c 3 4 d 4 5 e 5 6 f 6 7 g 7
In this example, we illustrate the creation of a data frame using the data frame function in its simplest form. It is easy to see how using this function to create more complex data frames could become problematic and rather cumbersome.
> for(i in 1:5) { + assign(paste0(“x_”, i), i) + } > x_1 [1] 1 > x_2 [1] 2 > x_3 [1] 3 > x_4 [1] 4 > x_5 [1] 5
In this example, we illustrate how to use the assign function along with the paste0 function to create multiple variables with different values.
Application
The main application of the assign function is to assign values to variables. However, this function has additional arguments that give it increased environmental flexibility, allowing you to control whether or not the object that you are creating is restricted to a particular environment. This makes the assign function more powerful in some ways than the assign operators . As a result, an important application of this function is to be able to assign values to variables under circumstances where the program needs to decide on the fly the environmental limitations of the object being created. If you need this flexibility, that is the time to use the assign function. The other situation where this function comes in handy would be if you were creating code that is designed to be as searchable as possible because the word “assign” is a search term that someone is likely to use.
The assign function may at first glance seem redundant, but its additional arguments give it added flexibility over the assign operators. As a result, this function can actually be a useful tool. This flexibility makes it a powerful tool that you will probably find more useful as you get more experience with it.
assign: Assign a Value to a Name
Description.
Assign a value to a name in an environment.
a variable name, given as a character string. No coercion is done, and the first element of a character vector of length greater than one will be used, with a warning.
a value to be assigned to x .
where to do the assignment. By default, assigns into the current environment. See ‘Details’ for other possibilities.
the environment to use. See ‘Details’.
should the enclosing frames of the environment be inspected?
an ignored compatibility feature.
This function is invoked for its side effect, which is assigning value to the variable x . If no envir is specified, then the assignment takes place in the currently active environment.
If inherits is TRUE , enclosing environments of the supplied environment are searched until the variable x is encountered. The value is then assigned in the environment in which the variable is encountered (provided that the binding is not locked: see lockBinding : if it is, an error is signaled). If the symbol is not encountered then assignment takes place in the user's workspace (the global environment).
If inherits is FALSE , assignment takes place in the initial frame of envir , unless an existing binding is locked or there is no existing binding and the environment is locked (when an error is signaled).
There are no restrictions on the name given as x : it can be a non-syntactic name (see make.names ).
The pos argument can specify the environment in which to assign the object in any of several ways: as -1 (the default), as a positive integer (the position in the search list); as the character string name of an element in the search list; or as an environment (including using sys.frame to access the currently active function calls). The envir argument is an alternative way to specify an environment, but is primarily for back compatibility.
assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc.
Note that assignment to an attached list or data frame changes the attached copy and not the original object: see attach and with .
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language . Wadsworth & Brooks/Cole.
<- , get , the inverse of assign() , exists , environment .
Run the code above in your browser using DataLab
Stats and R
Data manipulation in r, introduction, concatenation, seq() and rep(), elements of a vector, type and length, finding the vector type, modifications of type and length, numerical operators, logical operators, all() and any(), operations on character strings vector, orders and vectors, creating factors, creating lists, getting details on an object, line and column names, first or last observations, random sample of observations, based on row or column numbers, based on variable names, based on one or multiple criterion, transform a continuous variable into a categorical variable, sum and mean in rows, sum and mean in column, recode categorical variables, change reference level, rename variable names, create a data frame manually, merging two data frames, add new observations from another data frame, add new variables from another data frame, extraction from dates, exporting and saving, looking for help.
Note that this article is inspired from a workshop entitled “Introduction to data analysis with R”, given by UCLouvain’s Statistical Methodology and Computing Service. See all their workshops on their website .
Not all data frames are as clean and tidy as you would expect. Therefore, after importing your data frame into RStudio , most of the time you will need to prepare it before performing any statistical analyses. Data manipulation can even sometimes take longer than the actual analyses when the quality of the data is poor.
Data manipulation include a broad range of tools and techniques. We present here in details the manipulations that you will most likely need for your projects in R. Do not hesitate to let me know (as a comment at the end of this article for example) if you find other data manipulations essential so that I can add them.
In this article we show the main functions to manipulate data in R. We first illustrate these functions on vectors, factors and lists. We then illustrate the main functions to manipulate data frames and dates/times in R.
For those who are interested in going further, see also an introduction to data manipulation in R with the {dplyr} package .
We can concatenate (i.e., combine) numbers or strings with c() :
Note that by default R displays 7 decimals. You can modify it with options(digits = 2) (two decimals).
It is also possible to create a sequence of consecutive integers :
seq() allows to make a vector defined by a sequence. You can either choose the increment:
or its length:
On the other hand, rep() creates a vector which is the repetition of numbers or strings:
You can also create a vector which is the repetition of numbers and strings:
but in that case, the number 2 will be considered as a string too (and not as a numeric ) since there is at least one string in the vector.
There are three ways to assign an object in R:
You can also assign a vector to another vector, for example:
We can select one or several elements of a vector by specifying its position between square brackets:
Note that in R the numbering of the indices starts at 1 (and no 0 like other programming languages) so x[1] gives the first element of the vector x .
We can also use booleans (i.e., TRUE or FALSE ) to select some elements of a vector. This method selects only the elements corresponding to TRUE :
Or we can give the elements to withdraw:
The main types of a vector are numeric , logical and character . For more details on each type, see the different data types in R .
class() gives the vector type:
As you can see above, the class of a vector will be numeric only if all of its elements are numeric. As soon as one element is a character, the class of the vector will be a character.
length() gives the length of a vector:
So to select the last element of a vector (in a dynamic way), we can use a combination of length() and [] :
We can find the type of a vector with the family of is.type functions:
Or in a more generic way with the is() function:
We can change the type of a vector with the as.numeric() , as.logical() and as.character() functions:
It is also possible to change its length:
As you can see, the first elements of the vector are conserved while all others are removed. In this case, the first 4 since we specified a length of 4.
The basic numerical operators such as + , - , * , / and ^ can be applied to vectors:
It is also possible to compute the minimum, maximum , sum, product, cumulative sum and cumulative product of a vector:
The following mathematical operations can be applied too:
- sqrt() (square root)
- cos() (cosine)
- sin() (sine)
- tan() (tangent)
- log() (logarithm)
- log10() (base 10 logarithm)
- exp() (exponential)
- abs() (absolute value)
If you need to round a number, you can use the round() , floor() and ceiling() functions:
The most common logical operators in R are:
- Negation: !
- Comparisons: < , <= , >= , > , == (equality), != (difference)
As the names suggest, all() return TRUE if conditions are met for all elements, whereas any() returns TRUE if conditions are met for any of the element of a vector:
You can paste two vectors (or more) together:
The argument sep stands for separator and allows to specify the character(s) or symbol(s) used to separate each character strings.
If you do not want to specify a separator, you can use sep = "" or the paste0() function:
To find the positions of the elements containing a given string, use the grep() function:
To extract a character string based on the beginning and the end positions, we can use the substr() function:
Replace a character string by another one if it exists in the vector by using the sub() function:
Split a character string based on a specific symbol with the strsplit() function:
To transform a character vector to uppercase and lowercase:
We can sort the elements of a vector from smallest to largest, or from largest to smallest:
order() gives the permutation to apply to the vector in order to sort its elements:
As you can see, the third element of the vector is the smallest and the second element is the largest. This is indicated by the 3 at the beginning of the output, and the 2 at the end of the output.
Like sort() the decreasing = TRUE argument can also be added:
In this case, the 2 in the output indicates that the second element of the vector is the largest, while the 3 indicates that the third element is the smallest.
rank() gives the ranks of the elements:
The two last elements of the vector have a rank of 2.5 because they are equal and they come after the first but before the fourth rank.
We can also reverse the elements (from the last one to the first one):
Factors in R are vectors with a list of levels, also referred as categories. Factors are useful for qualitative data such as the gender, civil status, eye color, etc.
We create factors with the factor() function (do not forget the c() ):
We can of course create a factor from an existing vector:
We can also specify that the levels are ordered by adding the ordered = TRUE argument:
Note that the order of the levels will follow the order that is specified in the labels argument.
To know the names of the levels:
For the number of levels:
In R, the first level is always the reference level. This reference level can be modified with relevel() :
You see that “T3” is now the first and thus the reference level. Changing the reference level has an impact on the order they are displayed or treated in statistical analyses. Compare, for instance, boxplots with different reference levels.
To know the frequencies for each level:
Note that the relative frequencies (i.e., the proportions) can be found with the combination of prop.table() and table() or summary() :
Remember that a factor is coded in R as a numeric vector even though it looks like a character one. We can transform a factor into its numerical equivalent with the as.numeric() function:
And a numeric vector can be transformed into a factor with the as.factor() or factor() function:
The advantage of factor() is that it is possible to specify a name for each level:
A list is a vector whose elements can be of different natures: a vector, a list, a factor, numeric or character, etc.
The function list() allows to create lists:
There are several methods to extract elements from a list:
To transform a list into a vector:
attributes() gives the names of the elements (it can be used on every R object):
str() gives a short description about the elements (it can also be used on every R object):
Data frames
Every imported file in R is a data frame (at least if you do not use a package to import your data in R ). A data frame is a mix of a list and a matrix: it has the shape of a matrix but the columns can have different classes.
Remember that the gold standard for a data frame is that:
- columns represent variables
- lines correspond to observations and
- each value must have its own cell
In this article, we use the data frame cars to illustrate the main data manipulation techniques. Note that the data frame is installed by default in RStudio (so you do not need to import it) and I use the generic name dat as the name of the data frame throughout the article (see here why I always use a generic name instead of more specific names).
Here is the whole data frame:
This data frame has 50 observations with 2 variables ( speed and distance ).
You can check the number of observations and variables with nrow() and ncol() respectively, or both at the same time with dim() :
Before manipulating a data frame, it is interesting to know the line and column names:
To know only the column names:
And to know only the row names:
Subset a data frame
- To keep only the first 10 observations:
- To keep only the last 5 observations:
- To draw a sample of 4 observations without replacement:
If you know what observation(s) or column(s) you want to keep, you can use the row or column number(s) to subset your data frame. We illustrate this with several examples:
- keep all the variables for the \(3^{rd}\) observation:
- keep the \(2^{nd}\) variable for all observations:
- You can mix the two above methods to keep only the \(2^{nd}\) variable of the \(3^{rd}\) observation:
- keep several observations; for example observations \(1\) to \(5\) , the \(10^{th}\) and the \(15^{th}\) observation for all variables:
- remove observations 5 to 45:
- tip: to keep only the last observation, use nrow() instead of the row number:
This way, no matter the number of observations, you will always select the last one. This technique of using a piece of code instead of a specific value is to avoid “hard coding”. Hard coding is generally not recommended (unless you want to specify a parameter that you are sure will never change) because if your data frame changes, you will need to manually edit your code.
As you probably figured out by now, you can select observations and/or variables of a dataset by running dataset_name[row_number, column_number] . When the row (column) number is left empty, the entire row (column) is selected.
Note that all examples presented above also work for matrices:
To select one variable of the dataset based on its name rather than on its column number, use dataset_name$variable_name :
Accessing variables inside a data frame with this second method is strongly recommended compared to the first if you intend to modify the structure of your database. Indeed, if a column is added or removed in the data frame, the numbering will change. Therefore, variables are generally referred to by its name rather than by its position (column number). In addition, it is easier to understand and interpret code with the name of the variable written (another reason to call variables with a concise but clear name). There is only one reason why I would still use the column number; if the variables names are expected to change while the structure of the data frame will not change.
To select variables, it is also possible to use the select() command from the powerful dplyr package (for compactness only the first 6 observations are displayed thanks to the head() command):
This is equivalent than removing the distance variable:
Instead of subsetting a data frame based on row/column numbers or variable names, you can also subset it based on one or multiple criterion:
- keep only observations with speed larger than 20. The first argument refers to the name of the data frame, while the second argument refers to the subset criteria:
- keep only observations with distance smaller than or equal to 50 and speed equal to 10. Note the == (and not = ) for the equal criteria:
- use | to keep only observations with distance smaller than 20 or speed equal to 10:
- to filter out some observations, use != . For instance, to keep observations with speed not equal to 24 and distance not equal to 120 (for compactness only the last 6 observations are displayed thanks to the tail() command):
Note that it is also possible to subset a data frame with split() :
The above code will split your data frame into several lists, one for each level of the factor variable.
Create a new variable
Often, a data frame can be enhanced by creating new variables based on other variables from the initial data frame, or simply by adding a new variable manually.
In this example, we create two new variables; one being the speed times the distance (which we call speed_dist ) and the other being a categorization of the speed (which we call speed_cat ). We then display the first 6 observations of this new data frame with the 4 variables:
Note than in programming, a character string is generally surrounded by quotes (e.g., "character string" ) and R is not an exception.
To transform a continuous variable into a categorical variable (also known as qualitative variable ):
This transformation is for example often done on age, when the age (a continuous variable) is transformed into a qualitative variable representing different age groups.
In survey with Likert scale (used in psychology, among others), it is often the case that we need to compute a score for each respondents based on multiple questions. The score is usually the mean or the sum of all the questions of interest.
This can be done with rowMeans() and rowSums() . For instance, let’s compute the mean and the sum of the variables speed , dist and speed_dist (variables must be numeric of course as a sum and a mean cannot be computed on qualitative variables!) for each row and store them under the variables mean_score and total_score :
It is also possible to compute the mean and sum by column with colMeans() and colSums() :
This is equivalent than:
but it allows to do it for several variables at a time.
Categorical variables and labels management
For categorical variables, it is a good practice to use the factor format and to name the different levels of the variables.
- for this example, let’s create another new variable called dist_cat based on the distance and then change its format from numeric to factor (while also specifying the labels of the levels):
- to check the format of a variable:
This will be sufficient if you need to format only a limited number of variables. However, if you need to do it for a large amount of categorical variables, it quickly becomes time consuming to write the same code many times. As you can imagine, it possible to format many variables without having to write the entire code for each variable one by one by using the within() command:
Alternatively, if you want to transform several numeric variables into categorical variables without changing the labels, it is best to use the transform() function. We illustrate this function with the mpg data frame from the {ggplot2} package:
It is possible to recode labels of a categorical variable if you are not satisfied with the current labels. In this example, we change the labels as follows:
- “small distance” becomes “short distance”
- “big distance” becomes “large distance”
For some analyses, you might want to change the order of the levels. For example, if you are analyzing data about a control group and a treatment group, you may want to set the control group as the reference group. By default, levels are ordered by alphabetical order or by its numeric value if it was transformed from numeric to factor.
- to check the current order of the levels (the first level being the reference):
In this case, “short distance” being the first level it is the reference level. It is the first level because it was initially set with a value equal to 1 when creating the variable.
- to change the reference level:
Large distance is now the first and thus the reference level.
To rename variable names as follows:
- dist \(\rightarrow\) distance
- speed_dist \(\rightarrow\) speed_distance
- dist_cat \(\rightarrow\) distance_cat
use the rename() command from the dplyr package:
Although most analyses are performed on an imported data frame, it is also possible to create a data frame directly in R:
By default, the merge is done on the common variables (variables that have the same name). However, if they do not have the same name, it is still possible to merge the two data frames by specifying their names:
We want to merge the two data frames by the subject number, but this number is referred as person in the first data frame and patient in the second data frame, so we need to indicate it:
In order to add new observations from another data frame, the two data frames need to have the same column names (but they can be in a different order):
As you can see, data for persons 5 to 8 have been added at the end of the data frame dat1 (because dat1 comes before dat3 in the rbind() function).
It is also possible to add new variables to a data frame with the cbind() function. Unlike rbind() , column names do not have to be the same since they are added next to each other:
If you want to add only a specific variable from another data frame:
or more simply with the data.frame() function:
Missing values
Missing values (represented by NA in RStudio, for “Not Applicable”) are often problematic for many analyses because many computations including a missing value has a missing value for result.
For instance, the mean of a series or variable with at least one NA will give a NA as a result. The data frame dat created in the previous section is used for this example:
The na.omit() function avoids the NA result, doing as if there was no missing value:
Moreover, most basic functions include an argument to deal with missing values:
is.na() indicates if an element is a missing value or not:
Note that “NA” as a string is not considered as a missing value:
To check whether there is at least one missing value in a vector or data frame:
Nonetheless, data frames with NAs are still problematic for some types of analysis. Several alternatives exist to remove or impute missing values.
A simple solution is to remove all observations (i.e., rows) containing at least one missing value. This is done by keeping only observations with complete cases:
Be careful when removing observations with missing values, especially if missing values are not “missing at random”. It is not because it is possible (and easy) to remove them, that you should do it in all cases. This is, however, beyond the scope of the present article.
Instead of removing observations with at least one NA, it is possible to impute them, that is, replace them by some values such as the median or the mode of the variable. This can be done easily with the command impute() from the package Hmisc :
When the median/mode method is used (the default), character vectors and factors are imputed with the mode. Numeric and integer vectors are imputed with the median. Again, use imputations carefully. Other packages offer more advanced imputation techniques. However, we keep it simple and straightforward for this article as advanced imputations is beyond the scope of introductory data manipulations in R.
Scaling (also referred as standardizing) a variable is often used before a Principal Component Analysis (PCA) 1 when variables of a data frame have different units. Remember that scaling a variable means that it will compute the mean and the standard deviation of that variable. Then each value (so each row) of that variable is “scaled” by subtracting the mean and dividing by the standard deviation of that variable. Formally:
\[z = \frac{x - \bar{x}}{s}\]
where \(\bar{x}\) and \(s\) are the mean and the standard deviation of the variable, respectively.
To scale one or more variables in R use scale() :
Dates and times
In R the default date format follows the rules of the ISO 8601 international standard which expresses a day as “2001-02-13” (yyyy-mm-dd). 2
Date can be defined by a string of characters or a number. For example, October 1st, 2016:
An example with date and time vectors:
Find more information on how to express a date and time format with help(strptime) .
We can extract:
If a copy-paste is not sufficient, you can save an object in R format with save() :
or using write.table() , write.csv() or write.xlsx() :
If you need to send every results into a file instead of the console:
(Don’t forget to stop it with sink() .)
You can always find some help about:
- a function: ?function or help(function)
- a package: help(package = packagename)
- a concept: help.search("concept") or apropos("concept")
Otherwise, Google is your best friend!
Thanks for reading.
I hope this article helped you to manipulate your data in RStudio. For those who are interested in going further, see also an introduction to data manipulation in R with the {dplyr} package .
Now that you know how to import a data frame into R and how to manipulate it, the next step would probably be to learn how to perform descriptive statistics in R . If you are looking for more advanced statistical analyses using R, see all articles about R .
As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.
Principal Component Analysis (PCA) is a useful technique for exploratory data analysis, allowing a better visualization of the variation present in a data frame with a large number of variables. When there are many variables, the data cannot easily be illustrated in their raw format. To counter this, the PCA takes a data frame with many variables and simplifies it by transforming the original variables into a smaller number of “principal components”. The first dimension contains the most variance in the data frame and so on, and the dimensions are uncorrelated. Note that PCA is done on quantitative variables. ↩︎
For your information, note that this date format is not the same for every software! Excel, for instance, uses a different format. ↩︎
Related articles
- Data types in R
- How to import an Excel file in RStudio?
- How to install R and RStudio?
- Introduction to data manipulation in R with {dplyr}
- Top 10 errors in R and how to fix them
Liked this post?
- Get updates every time a new article is published (no spam and unsubscribe anytime):
Yes, receive new posts by email
- Support the blog
FAQ Contribute Sitemap
R Data Structures
R statistics, r variables, creating variables in r.
Variables are containers for storing data values.
R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:
From the example above, name and age are variables , while "John" and 40 are values .
In other programming language, it is common to use = as an assignment operator. In R, we can use both = and <- as assignment operators.
However, <- is preferred in most cases because the = operator can be forbidden in some contexts in R.
Character variables can be declared by either using single or double quotes:
Print / Output Variables
Compared to many other programming languages, you do not have to use a function to print/output variables in R. You can just type the name of the variable:
However, R does have a print() function available if you want to use it. This might be useful if you are familiar with other programming languages, such as Python , which often use a print() function to output variables.
And there are times you must use the print() function to output code, for example when working with for loops (which you will learn more about in a later chapter):
Conclusion: It is up to your if you want to use the print() function or not to output code. However, when your code is inside an R expression (for example inside curly braces {} like in the example above), use the print() function if you want to output the result.
COLOR PICKER
Contact Sales
If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]
Report Error
If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]
Top Tutorials
Top references, top examples, get certified.
Secure Your Spot in Our Statistical Methods in R Online Course Starting on September 9 (Click for More Info)
assign Function in R (2 Examples)
In this tutorial, I’ll illustrate how to assign values to a variable name using the assign() function in R .
Table of contents:
Sound good? Great, here’s how to do it…
Definition & Basic R Syntax of assign Function
Definition: The assign R function assigns values to a variable name.
Basic R Syntax: Please find the basic R programming syntax of the assign function below.
In the remaining article, I’ll show you two examples for the application of the assign function in the R programming language.
Example 1: Using assign Function to Create New Vector
In Example 1, I’ll show how to use the assign() command to assign numeric values to a new vector object.
Within the assign function, we have to specify the name of the new vector (i.e. “x”) and the values we want to store in this vector object (i.e. five numeric values ranging from 1 to 5):
Let’s have a look at our new data object x:
As you can see based on the previous output of the RStudio console, we have saved a numeric sequence from 1 to 5 in a new variable called x.
Example 2: Using assign & paste0 Functions Dynamically in for-Loop
In Example 2, I’ll show how to use the assign function in combination with the paste0 function to create new variable names within a for-loop dynamically.
Within the assign function, we are using paste0 to create new variable names combining the prefix “x_” with the running index i:
Let’s have a look at the output variables:
Video, Further Resources & Summary
Have a look at the following video of my YouTube channel. I explain the contents of this article in the video:
The YouTube video will be added soon.
In addition, you might have a look at the related tutorials on this homepage .
- for-Loop in R
- paste & paste0 R Functions
- R Functions List (+ Examples)
- The R Programming Language
In this R tutorial you learned how to create new data objects using assign() . In case you have further questions, let me know in the comments section.
Subscribe to the Statistics Globe Newsletter
Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy .
2 Comments . Leave new
You are very welcome Hong!
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Post Comment
I’m Joachim Schork. On this website, I provide statistics tutorials as well as code in Python and R programming.
Statistics Globe Newsletter
Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy .
Related Tutorials
row() Function in R (2 Examples) | Get Matrix of Row Indices or Factor Labels
Create List of Installed Packages in R (Example)
- Data Visualization
- Statistics in R
- Machine Learning in R
- Data Science in R
- Packages in R
R Variables – Creating, Naming and Using Variables in R
A variable is a memory allocated for the storage of specific data and the name associated with the variable is used to work around this reserved block.
The name given to a variable is known as its variable name . Usually a single variable stores only the data belonging to a certain data type.
The name is so given to them because when the program executes there is subject to change hence it varies from time to time.
Variables in R
R Programming Language is a dynamically typed language, i.e. the R Language Variables are not declared with a data type rather they take the data type of the R-object assigned to them.
This feature is also shown in languages like Python and PHP.
Creating Variables in R Language
Let’s look at ways of declaring and initializing variables in R language:
R supports three ways of variable assignment:
- Using equal operator- operators use an arrow or an equal sign to assign values to variables.
- Using the leftward operator- data is copied from right to left.
- Using the rightward operator- data is copied from left to right.
Syntax for creating R Variables
Types of Variable Creation in R:
Using equal to operators variable_name = value using leftward operator variable_name <- value using rightward operator value -> variable_name
Creating Variables in R With Example
Let’s look at the live example of creating Variables in R:
Nomenclature of R Variables
The following rules need to be kept in mind while naming a R variable:
- A valid variable name consists of a combination of alphabets, numbers, dot(.), and underscore(_) characters. Example: var.1_ is valid
- Apart from the dot and underscore operators, no other special character is allowed. Example: var$1 or var#1 both are invalid
- Variables can start with alphabets or dot characters. Example: .var or var is valid
- The variable should not start with numbers or underscore. Example: 2var or _var is invalid.
- If a variable starts with a dot the next thing after the dot cannot be a number. Example: .3var is invalid
- The variable name should not be a reserved keyword in R. Example: TRUE, FALSE,etc.
Important Methods for R Variables
R provides some useful methods to perform operations on variables. These methods are used to determine the data type of the variable, finding a variable, deleting a variable, etc. Following are some of the methods used to work on variables:
1. class() function
This built-in function is used to determine the data type of the variable provided to it.
The R variable to be checked is passed to this as an argument and it prints the data type in return.
2. ls() function
This built-in function is used to know all the present variables in the workspace.
This is generally helpful when dealing with a large number of variables at once and helps prevents overwriting any of them.
Syntax
3. rm() function
This is again a built-in function used to delete an unwanted variable within your workspace.
This helps clear the memory space allocated to certain variables that are not in use thereby creating more space for others. The name of the variable to be deleted is passed as an argument to it.
Syntax
Example
Scope of Variables in R programming
The location where we can find a variable and also access it if required is called the scope of a variable . There are mainly two types of variable scopes:
1. Global Variables
Global variables are those variables that exist throughout the execution of a program. It can be changed and accessed from any part of the program.
As the name suggests, Global Variables can be accessed from any part of the program.
- They are available throughout the lifetime of a program.
- They are declared anywhere in the program outside all of the functions or blocks.
Declaring global variables
Global variables are usually declared outside of all of the functions and blocks. They can be accessed from any portion of the program.
In the above code, the variable ‘ global’ is declared at the top of the program outside all of the functions so it is a global variable and can be accessed or updated from anywhere in the program.
2. Local Variables
Local variables are those variables that exist only within a certain part of a program like a function and are released when the function call ends. Local variables do not exist outside the block in which they are declared, i.e. they can not be accessed or used outside that block.
Declaring local variables
Local variables are declared inside a block.
Difference between local and global variables in R
- Scope A global variable is defined outside of any function and may be accessed from anywhere in the program, as opposed to a local variable.
- Lifetime A local variable’s lifetime is constrained by the function in which it is defined. The local variable is destroyed once the function has finished running. A global variable, on the other hand, doesn’t leave memory until the program is finished running or the variable is explicitly deleted.
- Naming conflicts If the same variable name is used in different portions of the program, they may occur since a global variable can be accessed from anywhere in the program. Contrarily, local variables are solely applicable to the function in which they are defined, reducing the likelihood of naming conflicts.
- Memory usage Because global variables are kept in memory throughout program execution, they can eat up more memory than local variables. Local variables, on the other hand, are created and destroyed only when necessary, therefore they normally use less memory.
We have covered the concept of “ Variables in R ” to give you overview of R variables. How to create variables in R?, how to use variables in R? and all the other questions have been answered in this article.
Hope you find it helpful, and implement it in your projects.
Similar Reads
- R-Variables
Please Login to comment...
- Best Smartwatches in 2024: Top Picks for Every Need
- Top Budgeting Apps in 2024
- 10 Best Parental Control App in 2024
- Top Language Learning Apps in 2024
- GeeksforGeeks Practice - Leading Online Coding Platform
Improve your Coding Skills with Practice
What kind of Experience do you want to share?
UC Business Analytics R Programming Guide
Assignment & evaluation.
The first operator you’ll run into is the assignment operator. The assignment operator is used to assign a value. For instance we can assign the value 3 to the variable x using the <- assignment operator. We can then evaluate the variable by simply typing x at the command line which will return the value of x . Note that prior to the value returned you’ll see ## [1] in the command line. This simply implies that the output returned is the first output. Note that you can type any comments in your code by preceding the comment with the hashtag ( # ) symbol. Any values, symbols, and texts following # will not be evaluated.
Interestingly, R actually allows for five assignment operators:
The original assignment operator in R was <- and has continued to be the preferred among R users. The = assignment operator was added in 2001 primarily because it is the accepted assignment operator in many other languages and beginners to R coming from other languages were so prone to use it. However, R uses = to associate function arguments with values (i.e. f(x = 3) explicitly means to call function f and set the argument x to 3. Consequently, most R programmers prefer to keep = reserved for argument association and use <- for assignment.
The operators <<- is normally only used in functions which we will not get into the details. And the rightward assignment operators perform the same as their leftward counterparts, they just assign the value in an opposite direction.
Overwhelmed yet? Don’t be. This is just meant to show you that there are options and you will likely come across them sooner or later. My suggestion is to stick with the tried and true <- operator. This is the most conventional assignment operator used and is what you will find in all the base R source code…which means it should be good enough for you.
Lastly, note that R is a case sensitive programming language. Meaning all variables, functions, and objects must be called by their exact spelling:
- Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
- Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
- OverflowAI GenAI features for Teams
- OverflowAPI Train & fine-tune LLMs
- Labs The future of collective knowledge sharing
- About the company Visit the blog
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Get early access and see previews of new features.
Export Eigen template instantiation from a module
I'm trying to speed up my compilation:
https://www.reddit.com/r/cpp/comments/1fmbdl6/comment/lolznmk/?context=3
Inspired by
https://gitlab.com/libeigen/eigen/-/issues/1920
I'm trying to make this example work:
Currently, it fails (VS22) with call_assignment_no_alias, which I probably need to export.
Any easy, elegant way to export instantiation of Eigen types?
Know someone who can answer? Share a link to this question via email , Twitter , or Facebook .
Your answer.
Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more
Sign up or log in
Post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
Browse other questions tagged c++ eigen or ask your own question .
- The Overflow Blog
- Where developers feel AI coding tools are working—and where they’re missing...
- Masked self-attention: How LLMs learn relationships between tokens
- Featured on Meta
- User activation: Learnings and opportunities
- Preventing unauthorized automated access to the network
- Should low-scoring meta questions no longer be hidden on the Meta.SO home...
- Announcing the new Staging Ground Reviewer Stats Widget
- Feedback Requested: How do you use the tagged questions page?
Hot Network Questions
- Help Identify ebike electrical C13-like socket
- Is it actually really easy to state conjectures, which are open and on the first view really hard to prove?
- In John 3:16, what is the significance of Jesus' distinction between the terms 'world' and 'everyone who believes' within the context?
- Macro not working in newcommand, but plain text input works
- Help. It's not compiling!
- Does copying files from one drive to another also copy previously deleted data from the drive one?
- Is “No Time To Die” the first Bond film to feature children?
- Does this work for page turns in a busy violin part?
- Literature reference for variance of the variance of the binomial proportion
- Is there a fast/clever way to return a logical vector if elements of a vector are in at least one interval?
- Do mathematicians care about the validity ("truth") of the axioms?
- Print 4 billion if statements
- Waiting girl's face
- Five Hundred Cigarettes
- Purpose of sleeve on sledge hammer handle
- Given the optimal ate pairing e(A,B)=y is to possible to determine I and J such as e(I,J)=2y or even e(I,J)=3y?
- Does Voyager send its data on a schedule and hope somebody's listening?
- How many jet fighters is Azerbaijan purchasing?
- Simulate people leaving a cocktail party
- Where is the best place to get bows in TotK?
- Tikz: On straight lines moving balls on a circle inside a regular polygon
- Does a passenger jet detect excessive weight in the tail after it's just landed?
- I want a smooth orthogonalization process
- Writing dental notations
IMAGES
VIDEO
COMMENTS
The assign() function in R can be used to assign values to variables.. This function uses the following basic syntax: assign(x, value) where: x: A variable name, given as a character string.; value: The value(s) to be assigned to x.; The following examples show how to use this function in practice.
In R, I'm writing a for-loop that will iteratively create variable names and then assign values to each variable. Here is a simplified version. The intention is to create the variable's name based on the value of iterating variable i, then fill the new variable with NA values. (I'm only iterating 1:1 below since the problem occurs isn't related ...
The Basics of R's Assign. Assign is among the more unassuming functions in R's toolkit. At first glance, it seems like one of those cruft functions that often persist in a language long after its usefulness is over. This is because by default the assign function is just a slightly more verbose way of enacting a standard variable assignment.
If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R. See 'The R Language Definition' manual for further details and examples.
Variables in R can be assigned in one of three ways. Assignment Operator: "=" used to assign the value.The following example contains 20 as value which is stored in the variable 'first.variable' Example: first.variable = 20. '<-' Operator: The following example contains the New Program as the character which gets assigned to 'second.variable'.
Assigning a Value to a Variable. In R, we state values directly in the chunk or the console, e.g.: 3. [1] 3. Here, we just state 3, so R simply "throws" that right back at you! Now, if want to "catch" that 3 we have to assign it to a variable, e.g.: x <- 3. Notice how now we "catch" the 3 and nothing is "thrown" back to you ...
assign() is a function in R that allows us to create a new variable and assign it a value or assign a new value to an existing variable. While this might seem straightforward and similar to the traditional assignment operator <-or =, assign() is more flexible as it allows dynamic assignment of variable names.In this article, we will explore the assign() function in depth, discussing its syntax ...
For Assignments. The <- operator is the preferred choice for assigning values to variables in R. It clearly distinguishes assignment from argument specification in function calls. # Correct usage of <- for assignment x <- 10 # Correct usage of <- for assignment in a list and the = # operator for specifying named arguments my_list <- list (a = 1 ...
On this page you'll learn how to apply the different assignment operators in the R programming language. The content of the article is structured as follows: 1) Example 1: Why You Should Use <- Instead of = in R. 2) Example 2: When <- is Really Different Compared to =. 3) Example 3: The Difference Between <- and <<-. 4) Video ...
Description. R programming has three ways of assigning values to a variable. They are =, <-, and the assign function. The assign function has the basic format of assign ("variable", value) where "variable" is the name of the variable receiving the values, note that it needs to be in quotes, and "value" is the value being assigned to ...
a variable name, given as a character string. No coercion is done, and the first element of a character vector of length greater than one will be used, with a warning. value. a value to be assigned to x. pos. where to do the assignment. By default, assigns into the current environment. See 'Details' for other possibilities.
See the main functions to manipulate data in R such as how to subset a data frame, create a new variable, recode categorical variables and rename a variable ... There are three ways to assign an object in R: <-= assign() # 1st method x <- c(2.1, 5, -4, 1, 5) x ... To scale one or more variables in R use scale(): dat_scaled <- scale(dat_imputed ...
Creating Variables in R. Variables are containers for storing data values. R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:
In this tutorial, I'll illustrate how to assign values to a variable name using the assign () function in R. Table of contents: 1) Definition & Basic R Syntax of assign Function. 2) Example 1: Using assign Function to Create New Vector. 3) Example 2: Using assign & paste0 Functions Dynamically in for-Loop. 4) Video, Further Resources & Summary.
R supports three ways of variable assignment: Using equal operator- operators use an arrow or an equal sign to assign values to variables. Using the leftward operator- data is copied from right to left. Using the rightward operator- data is copied from left to right.
The assignment operator is used to assign a value. For instance we can assign the value 3 to the variable x using the <- assignment operator. We can then evaluate the variable by simply typing x at the command line which will return the value of x. Note that prior to the value returned you'll see ## [1] in the command line.
Variable assignment. A basic concept in (statistical) programming is called a variable. A variable allows you to store a value (e.g. 4) or an object (e.g. a function description) in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You can assign a value 4 to a variable ...
6. @Jasha <<- will search up the chain of enclosures up to the global environment and assign to the first matching variable it finds. Hypothetically, if you have a function f() nested in a closure g() and a exists in g(), then using a <<- in f() will assign to a in g(), not to the global environment. Oftentimes, this is what you want, however.
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise, the assignment takes place in the global environment.
David, in my actual data set, students can also have "pre-majors" if they haven't been formally admitted to a major. If a student has only a pre-major, I would assign that as their primary major. But if they have both a pre-major and a major, I would assign the major as their primary major.
Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers; Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand; OverflowAI GenAI features for Teams; OverflowAPI Train & fine-tune LLMs; Labs The future of collective knowledge sharing; About the company Visit the blog