1. The relationship between Word length and age of acquisition

  • Kuperman et al. (2012) gathered age of acquisition estimates for a large collection of words via Amazon Mechanical Turk
  • Here we use a sample of 1000 words
  • Resarch question:
    • Are shorter nouns acquired earlier on in life?
  • Hypothesis:
    • Word length correlates positively with age of acquisition
  • Null hypothesis:
    • Word length is not correlated with age of acquisition

1.1 Loading and exploring the data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIHJlYWRyIHBhY2thZ2VcblxuIyBMb2FkIHRoZSBjb3Vyc2UgZGF0YSBmcm9tIHRoZSBjb3Vyc2Ugd2Vic2l0ZSB0byB0aGUgb2JqZWN0ICdkYXRhU2V0JzpcbiMgaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfS3VwZXJtYW5fZXRfYWxfMjAxMi5jc3ZcblxuIyBMb2FkIHRoZSBkcGx5ciBwYWNrYWdlXG5cbiMgUHJpbnQgYSAnZ2xpbXBzZScgb2YgdGhlIGRhdGFTZXRcblxuIyBQcmludCBhIHN1bW1hcnkgb2YgdGhlIEFnZU9mQWNxdWlzaXRpb24gY29sdW1uXG5cbiMgUHJpbnQgYSBzdW1tYXJ5IG9mIHRoZSBXb3JkTGVuZ3RoIGNvbHVtbiIsInNvbHV0aW9uIjoiIyBMb2FkIHRoZSByZWFkciBwYWNrYWdlXG5saWJyYXJ5KHJlYWRyKVxuIyBMb2FkIHRoZSBjb3Vyc2UgZGF0YSBmcm9tIHRoZSBjb3Vyc2Ugd2Vic2l0ZSB0byB0aGUgb2JqZWN0ICdkYXRhU2V0JzpcbiMgaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfS3VwZXJtYW5fZXRfYWxfMjAxMi5jc3ZcbmRhdGFTZXQgPC0gcmVhZF9jc3YoXCJodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9LdXBlcm1hbl9ldF9hbF8yMDEyLmNzdlwiKVxuIyBMb2FkIHRoZSBkcGx5ciBwYWNrYWdlXG5saWJyYXJ5KGRwbHlyKVxuIyBQcmludCBhICdnbGltcHNlJyBvZiB0aGUgZGF0YVNldFxuZ2xpbXBzZShkYXRhU2V0KVxuIyBQcmludCBhIHN1bW1hcnkgb2YgdGhlIEFnZU9mQWNxdWlzaXRpb24gY29sdW1uXG5zdW1tYXJ5KGRhdGFTZXQkQWdlT2ZBY3F1aXNpdGlvbilcbiMgUHJpbnQgYSBzdW1tYXJ5IG9mIHRoZSBXb3JkTGVuZ3RoIGNvbHVtblxuc3VtbWFyeShkYXRhU2V0JFdvcmRMZW5ndGgpIiwic2N0IjoidGVzdF9vYmplY3QoXCJkYXRhU2V0XCIpXG50ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJyZWFkclwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAncmVhZHInIHBhY2thZ2UhXCIpXG50ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJkcGx5clwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAnZHBseXInIHBhY2thZ2UhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcImdsaW1wc2UoZGF0YVNldClcIiwgICBpbmNvcnJlY3RfbXNnID0gXCJNYWtlIHN1cmUgdG8gcHJpbnQgYSAnZ2xpbXBzZScgb2YgdGhlIGRhdGEhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcInN1bW1hcnkoZGF0YVNldCRBZ2VPZkFjcXVpc2l0aW9uKVwiLCAgIGluY29ycmVjdF9tc2cgPSBcIk1ha2Ugc3VyZSB0byBwcmludCBhICdzdW1tYXJ5JyBvZiB0aGUgQWdlT2ZBY3F1aXNpdGlvbiBjb2x1bW4hXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcInN1bW1hcnkoZGF0YVNldCRXb3JkTGVuZ3RoKVwiLCAgIGluY29ycmVjdF9tc2cgPSBcIk1ha2Ugc3VyZSB0byBwcmludCBhICdzdW1tYXJ5JyBvZiB0aGUgV29yZExlbmd0aCBjb2x1bW4hXCIpXG5zdWNjZXNzX21zZyhcIkdyZWF0IVwiKSJ9

1.2 A first visual inspection of the relationship

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5kYXRhU2V0IDwtIHJlYWRfY3N2KFwiaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfS3VwZXJtYW5fZXRfYWxfMjAxMi5jc3ZcIikiLCJzYW1wbGUiOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIExvYWQgdGhlIGdncGxvdDIgcGFja2FnZVxuXG4jIERyYXcgYSBzY2F0dGVycGxvdCBvZiAgV29yZExlbmd0aCAoWC1heGlzKSB2cyBBZ2VPZkFjcXVpc2l0aW9uICh5LWF4aXMpXG4jIEFkZCBhIGxpbmVhciAobG0pIHRyZW5kbGluZSBcblxuIyBXaGF0IGRvIHlvdSBzZWU/IElzIHRoZSByZWxhdGlvbnNoaXAgbW9ub3RvbmljIGFuZCBsaW5lYXI/IFlvdSB3aWxsIGZpbmQgdGhlIGNvcnJlY3QgYW5zd2VyIG9uIHRoZSAnU29sdXRpb24nIHRhYiIsInNvbHV0aW9uIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBMb2FkIHRoZSBnZ3Bsb3QyIHBhY2thZ2VcbmxpYnJhcnkoZ2dwbG90MilcbiMgRHJhdyBhIHNjYXR0ZXJwbG90IG9mICBXb3JkTGVuZ3RoIChYLWF4aXMpIHZzIEFnZU9mQWNxdWlzaXRpb24gKHktYXhpcylcbiMgQWRkIGEgbGluZWFyIChsbSkgdHJlbmRsaW5lIFxuZ2dwbG90KGRhdGFTZXQsIGFlcyh4PVdvcmRMZW5ndGgsIHk9QWdlT2ZBY3F1aXNpdGlvbikpICsgXG4gIGdlb21fcG9pbnQoKSArIFxuICBnZW9tX3Ntb290aChtZXRob2Q9XCJsbVwiKVxuIyBXaGF0IGRvIHlvdSBzZWU/IElzIHRoZSByZWxhdGlvbnNoaXAgbW9ub3RvbmljIGFuZCBsaW5lYXI/IFxuIyBUaGUgcmVsYXRpb25zaGlwIGlzIG1vbm90b25pYzogV2hlbiB3b3JkIGxlbmd0aCBnb2VzIHVwLCB0aGVyZSBpcyBhbHNvIGFuIGluY3JlYXNlIGluIEFnZU9mQWNxdWlzaXRpb25cbiMgVGhlIHJlbGF0aW9uc2hpcCBpcyBub3QgbGluZWFyOiBXaGVuIHdvcmQgbGVuZ3RoIGdvZXMgdXAgd2l0aCBvbmUgdW5pdCwgdGhlcmUgaXMgbm90IGEgY29uc3RhbnQgaW5jcmVhc2UgaW4gdGhlIHZhbHVlcyBvZiBBZ2VPZkFjcXVpc2l0aW9uIGFjcm9zcyB0aGUgd29yZCBsZW5ndGggY29udGludXVtIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZ2dwbG90MlwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAnZ2dwbG90MicgcGFja2FnZSFcIilcbnRlc3RfZ2dwbG90KGluZGV4ID0gMSlcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQhXCIpIn0=

1.3 Testing the assumptions of Pearson’s r: bivariate normal distribution and smaple size

  • The plot tells us that the relationship is not linear, but it is monotonic
  • This tells us already that we cannot use the Pearson’s correlation without transforming the data
  • For practice, let’s see if the data satisfy the other assumptions
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5saWJyYXJ5KGVuZXJneSlcbmRhdGFTZXQgPC0gcmVhZF9jc3YoXCJodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9LdXBlcm1hbl9ldF9hbF8yMDEyLmNzdlwiKSIsInNhbXBsZSI6IiMgVGhlIGRhdGEuZnJhbWUgZGF0YVNldCBpcyBhbHJlYWR5IGluIHlvdXIgd29ya3NwYWNlXG5cbiMgTG9hZCB0aGUgZW5lcmd5IHBhY2thZ2VcblxuIyBQZXJmb3JtIGEgbXZub3JtLmV0ZXN0IHRvIHNlZSBpZiAnQWdlT2ZBY3F1aXNpdGlvbicgIGFuZCBXb3JkTGVuZ3RoIGhhdmUgYSBiaXZhcmlhdGUgbm9ybWFsIGRpc3RyaWJ1dGlvbi4gVXNlIDEwMDAgcmVwbGljYXRlc1xuXG4jIENoZWNrIHRoZSBzYW1wbGUgc2l6ZSBieSBjb21wdXRpbmcgdGhlIG51bWJlciBvZiByb3dzIGluIGRhdGFTZXRcblxuIyBXaGF0IGRvZXMgdGhlIHRlc3QgdGVsbCB5b3UsIGRvIHRoZSB0d28gY29sdW1ucyBoYXZlIGEgYml2YXJpYXRlIG5vcm1hbCBkaXN0cmlidXRpb24/IFlvdSB3aWxsIGZpbmQgdGhlIGNvcnJlY3QgYW5zd2VyIG9uIHRoZSAnU29sdXRpb24nIHRhYiIsInNvbHV0aW9uIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBMb2FkIHRoZSBlbmVyZ3kgcGFja2FnZVxubGlicmFyeShlbmVyZ3kpXG5cbiMgUGVyZm9ybSBhIG12bm9ybS5ldGVzdCB0byBzZWUgaWYgJ0FnZU9mQWNxdWlzaXRpb24nICBhbmQgV29yZExlbmd0aCBoYXZlIGEgYml2YXJpYXRlIG5vcm1hbCBkaXN0cmlidXRpb24uIFVzZSAxMDAwIHJlcGxpY2F0ZXNcbm12bm9ybS5ldGVzdChkYXRhU2V0WywgYyhcIkFnZU9mQWNxdWlzaXRpb25cIiwgXCJXb3JkTGVuZ3RoXCIpXSwgMTAwMClcblxuIyBDaGVjayB0aGUgc2FtcGxlIHNpemUgYnkgY29tcHV0aW5nIHRoZSBudW1iZXIgb2Ygcm93cyBpbiBkYXRhU2V0XG5ucm93KGRhdGFTZXQpXG5cbiMgV2hhdCBkb2VzIHRoZSB0ZXN0IHRlbGwgeW91LCBkbyB0aGUgdHdvIGNvbHVtbnMgaGF2ZSBhIGJpdmFyaWF0ZSBub3JtYWwgZGlzdHJpYnV0aW9uP1xuIyBwIDwgMC4wNSwgc28gdGhlIGRhdGEgZG8gbm90IGZvbGxvdyBhIGJpdmFyaWF0ZSBub3JtYWwgZGlzdHJpYnV0aW9uLiBOb3RlIHRoYXQgdGhpcyBpcyBub3QgdGhhdCBiaWcgb2YgYW4gaXNzdWUsIGJlY2F1c2UgdGhlIHNhbXBsZSBzaXplIGlzIGxhcmdlIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZW5lcmd5XCIsIFwiTWFrZSBzdXJlIHRvIGNhbGwgdGhlICdlbmVyZ3knIHBhY2thZ2UhXCIpXG50ZXN0X291dHB1dF9jb250YWlucygnbXZub3JtLmV0ZXN0KGRhdGFTZXRbLCBjKFwiQWdlT2ZBY3F1aXNpdGlvblwiLCBcIldvcmRMZW5ndGhcIildLCAxMDAwKScsIFwiTWFrZSBzdXJlIHlvdSBwZXJmb3JtIHRoZSBtdm5vcm0uZXRlc3QhXCIpXG50ZXN0X291dHB1dF9jb250YWlucygnbnJvdyhkYXRhU2V0KScsIFwiRG9uJ3QgZm9yZ2V0IHRvIGNvbXB1dGUgdGhlIHNhbXBsZSBzaXplIVwiKVxuc3VjY2Vzc19tc2coXCJFeGNlbGxlbnQgd29yayFcIikifQ==

1.4 Testing the assumptions of Pearson’s r: homoskedastic data

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5kYXRhU2V0IDwtIHJlYWRfY3N2KFwiaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfS3VwZXJtYW5fZXRfYWxfMjAxMi5jc3ZcIikiLCJzYW1wbGUiOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIExvYWQgdGhlIGNhciBwYWNrYWdlXG5cbiMgU3BlY2lmaWN5IGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgJ21vZCcgdGhhdCByZWdyZXNzZXMgQWdlT2ZBY3F1aXNpdGlvbiBvbiBXb3JkTGVuZ3RoIFxuXG4jIFBlcmZvcm0gYSBuY3ZUZXN0IG9uIHRoaXMgcmVncmVzc2lvbiBtb2RlbFxuXG4jIFdoYXQgZG9lcyB0aGUgdGVzdCB0ZWxsIHlvdSwgYXJlIHRoZSBkYXRhIGhldGVyb3NrZWRhc3RpYyBvciBob21vc2tlZGFzdGljPyBZb3Ugd2lsbCBmaW5kIHRoZSBjb3JyZWN0IGFuc3dlciBvbiB0aGUgJ1NvbHV0aW9uJyB0YWIiLCJzb2x1dGlvbiI6IiMgVGhlIGRhdGEuZnJhbWUgZGF0YVNldCBpcyBhbHJlYWR5IGluIHlvdXIgd29ya3NwYWNlXG5cbiMgTG9hZCB0aGUgY2FyIHBhY2thZ2VcbmxpYnJhcnkoY2FyKVxuIyBTcGVjaWZpY3kgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCAnbW9kJyB0aGF0IHJlZ3Jlc3NlcyBBZ2VPZkFjcXVpc2l0aW9uIG9uIFdvcmRMZW5ndGggXG5tb2QgPC0gbG0oQWdlT2ZBY3F1aXNpdGlvbiB+IFdvcmRMZW5ndGgsIGRhdGFTZXQpXG4jIFBlcmZvcm0gYSBuY3ZUZXN0IG9uIHRoaXMgcmVncmVzc2lvbiBtb2RlbFxubmN2VGVzdChtb2QpXG4jIFdoYXQgZG9lcyB0aGUgdGVzdCB0ZWxsIHlvdSwgYXJlIHRoZSBkYXRhIGhldGVyb3NrZWRhc3RpYyBvciBob21vc2tlZGFzdGljP1xuIyBwIDwgMC4wNSBjb25maXJtcyB3aGF0IHdlIHNhdyBvbiB0aGUgcGxvdDogdGhlIHJlbGF0aW9uc2hpcCBpcyBub3QgbGluZWFyIGFuZCB0aGUgZGF0YSBhcmUgaGV0ZXJvc2tlZGFzdGljIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiY2FyXCIsIFwiTWFrZSBzdXJlIHRvIGNhbGwgdGhlICdjYXInIHBhY2thZ2UhXCIpXG50ZXN0X291dHB1dF9jb250YWlucygnbmN2VGVzdChtb2QpJywgXCJNYWtlIHN1cmUgeW91IHBlcmZvcm0gdGhlIG5jdlRlc3QhXCIpXG5zdWNjZXNzX21zZyhcIkdyZWF0IVwiKSJ9

1.5 Testing the assumptions of Pearson’s r: independent data

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5saWJyYXJ5KGVuZXJneSlcbmRhdGFTZXQgPC0gcmVhZF9jc3YoXCJodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9LdXBlcm1hbl9ldF9hbF8yMDEyLmNzdlwiKSIsInNhbXBsZSI6IiMgVGhlIGRhdGEuZnJhbWUgZGF0YVNldCBpcyBhbHJlYWR5IGluIHlvdXIgd29ya3NwYWNlXG5cbiMgTG9hZCB0aGUgY2FyIHBhY2thZ2VcblxuIyBTcGVjaWZpY3kgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCAnbW9kJyB0aGF0IHJlZ3Jlc3NlcyBBZ2VPZkFjcXVpc2l0aW9uIG9uIFdvcmRMZW5ndGggXG5cbiMgUGVyZm9ybSBhIER1cmJpbi1XYXRzb24gdGVzdCBvbiB0aGlzIHJlZ3Jlc3Npb24gbW9kZWxcblxuIyBXaGF0IGRvZXMgdGhlIHRlc3QgdGVsbCB5b3UsIGFyZSB0aGUgZGF0YSBhdXRvY29ycmVsYXRlZD8gWW91IHdpbGwgZmluZCB0aGUgY29ycmVjdCBhbnN3ZXJzIG9uIHRoZSBTb2x1dGlvbiB0YWIiLCJzb2x1dGlvbiI6IiMgVGhlIGRhdGEuZnJhbWUgZGF0YVNldCBpcyBhbHJlYWR5IGluIHlvdXIgd29ya3NwYWNlXG5cbiMgTG9hZCB0aGUgY2FyIHBhY2thZ2VcbmxpYnJhcnkoY2FyKVxuIyBTcGVjaWZpY3kgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCAnbW9kJyB0aGF0IHJlZ3Jlc3NlcyBBZ2VPZkFjcXVpc2l0aW9uIG9uIFdvcmRMZW5ndGggXG5tb2QgPC0gbG0oQWdlT2ZBY3F1aXNpdGlvbiB+IFdvcmRMZW5ndGgsIGRhdGFTZXQpXG4jIFBlcmZvcm0gYSBuY3ZUZXN0IG9uIHRoaXMgcmVncmVzc2lvbiBtb2RlbFxuZHVyYmluV2F0c29uVGVzdChtb2QpXG4jIFdoYXQgZG9lcyB0aGUgdGVzdCB0ZWxsIHlvdSwgYXJlIHRoZSBkYXRhIGF1dG9jb3JyZWxhdGVkP1xuIyBwID4gMC4wNSB0ZWxscyB1cyB0aGF0IGF1dG9jb3JyZWxhdGlvbiBpcyBub3QgYW4gaXNzdWUiLCJzY3QiOiJ0ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJjYXJcIiwgXCJNYWtlIHN1cmUgdG8gY2FsbCB0aGUgJ2NhcicgcGFja2FnZSFcIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKCdkdXJiaW5XYXRzb25UZXN0KG1vZCknLCBcIk1ha2Ugc3VyZSB5b3UgcGVyZm9ybSB0aGUgZHVyYmluV2F0c29uVGVzdCFcIilcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQhXCIpIn0=

1.6 Performing the correlations test

  • All of our previous exercises point in the same direction: the data are not suited for the Pearson’s correlation test:
    • The relationship is not linear
    • The relationship is heteroskedastic
    • The data do not follow a bivariate normal distribution (less important, as the sample size is large)
  • Spearman’s Rho and Kendall’s tau are our only two options
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5kYXRhU2V0IDwtIHJlYWRfY3N2KFwiaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfS3VwZXJtYW5fZXRfYWxfMjAxMi5jc3ZcIikiLCJzYW1wbGUiOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIFBlcmZvcm0gYSBjb3IudGVzdCB3aXRoIHRoZSAnc3BlYXJtYW4nIG1ldGhvZC4gUmVjYWxsIHRoYXQgb3VyIGh5cG90aGVzaXMgaXMgZGlyZWN0aW9uYWw6IGl0IHByZWRpY3RzIGEgcG9zaXRpdmUgY29ycmVsYXRpb25cblxuIyBQZXJmb3JtIGEgY29yLnRlc3Qgd2l0aCB0aGUgJ2tlbmRhbGwnIG1ldGhvZC4gUmVjYWxsIHRoYXQgb3VyIGh5cG90aGVzaXMgaXMgZGlyZWN0aW9uYWw6IGl0IHByZWRpY3RzIGEgcG9zaXRpdmUgY29ycmVsYXRpb25cblxuIyBXaGF0IGRvIHRoZSB0d28gdGVzdHMgdGVsbCB5b3U6IFxuIyAtIElzIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcyBzdHJvbmc/IFxuIyAtIElzIHRoZSByZWxhdGlvbnNoaXAgcG9zaXRpdmUgb3IgbmVnYXRpdmU/IFxuIyAtIElzIGl0IHNpZ25pZmljYW50PyBcbiMgV2hpY2ggdGVzdCB5aWVsZHMgdGhlIG1vc3QgZXh0cmVtZSBlc3RpbWF0ZT9cbiMgWW91IHdpbGwgZmluZCB0aGUgY29ycmVjdCBhbnN3ZXJzIG9uIHRoZSBTb2x1dGlvbiB0YWIiLCJzb2x1dGlvbiI6IiMgVGhlIGRhdGEuZnJhbWUgZGF0YVNldCBpcyBhbHJlYWR5IGluIHlvdXIgd29ya3NwYWNlXG5cbiMgUGVyZm9ybSBhIGNvci50ZXN0IHdpdGggdGhlICdzcGVhcm1hbicgbWV0aG9kLiBSZWNhbGwgdGhhdCBvdXIgaHlwb3RoZXNpcyBpcyBkaXJlY3Rpb25hbDogaXQgcHJlZGljdHMgYSBwb3NpdGl2ZSBjb3JyZWxhdGlvblxuY29yLnRlc3QoZGF0YVNldCRBZ2VPZkFjcXVpc2l0aW9uLCBkYXRhU2V0JFdvcmRMZW5ndGgsIGFsdGVybmF0aXZlPVwiZ3JlYXRlclwiLCBtZXRob2Q9XCJzcGVhcm1hblwiKVxuXG4jIFBlcmZvcm0gYSBjb3IudGVzdCB3aXRoIHRoZSAna2VuZGFsbCcgbWV0aG9kLiBSZWNhbGwgdGhhdCBvdXIgaHlwb3RoZXNpcyBpcyBkaXJlY3Rpb25hbDogaXQgcHJlZGljdHMgYSBwb3NpdGl2ZSBjb3JyZWxhdGlvblxuY29yLnRlc3QoZGF0YVNldCRBZ2VPZkFjcXVpc2l0aW9uLCBkYXRhU2V0JFdvcmRMZW5ndGgsIGFsdGVybmF0aXZlPVwiZ3JlYXRlclwiLCBtZXRob2Q9XCJrZW5kYWxsXCIpXG5cbiMgV2hhdCBkb2VzIHRoZSB0d28gdGVzdHMgdGVsbCB5b3U6IFxuIyAtIElzIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcyBzdHJvbmc/IFxuIyBUaGUgcmVsYXRpb25zaGlwIGlzIHdlYWsgdG8gbW9kZXJhdGUgKHJobyBhbmQgdGF1IGFyb3VuZCAwLjMpXG4jIC0gSXMgdGhlIHJlbGF0aW9uc2hpcCBwb3NpdGl2ZSBvciBuZWdhdGl2ZT8gXG4jIFRoZSByZWxhdGlvbnNoaXAgaXMgcG9zaXRpdmUsIHRoZSBjb2VmZmljaWVudHMgYXJlIGdyZWF0ZXIgdGhhbiB6ZXJvXG4jIC0gSXMgaXQgc2lnbmlmaWNhbnQ/IFxuIyBwIDwgMC4wNSBzbyB3ZSBjYW4gY29uc2lkZXIgdGhlIHJlbGF0aW9uc2hpcCB0byBiZSB3ZWFrLCBidXQgc2lnbmlmaWNhbnRcbiMgV2hpY2ggdGVzdCB5aWVsZHMgdGhlIG1vc3QgZXh0cmVtZSBlc3RpbWF0ZT9cbiMgQXMgaXMgb2Z0ZW4gdGhlIGNhc2UsIFNwZWFybWFuJ3MgcmhvIGlzIG1vcmUgZXh0cmVtZSB0aGFuIEtlbmRhbGwncyB0YXVcbiMgWW91IHdpbGwgZmluZCB0aGUgY29ycmVjdCBhbnN3ZXJzIG9uIHRoZSBTb2x1dGlvbiB0YWIiLCJzY3QiOiJ0ZXN0X291dHB1dF9jb250YWlucygnY29yLnRlc3QoZGF0YVNldCRBZ2VPZkFjcXVpc2l0aW9uLCBkYXRhU2V0JFdvcmRMZW5ndGgsIGFsdGVybmF0aXZlPVwiZ3JlYXRlclwiLCBtZXRob2Q9XCJrZW5kYWxsXCIpJywgXCJNYWtlIHN1cmUgeW91IHBlcmZvcm0gdGhlIGNvci50ZXN0IHdpdGggdGhlICdrZW5kYWxsJyBtZXRob2QhXCIpXG50ZXN0X291dHB1dF9jb250YWlucygnY29yLnRlc3QoZGF0YVNldCRBZ2VPZkFjcXVpc2l0aW9uLCBkYXRhU2V0JFdvcmRMZW5ndGgsIGFsdGVybmF0aXZlPVwiZ3JlYXRlclwiLCBtZXRob2Q9XCJzcGVhcm1hblwiKScsIFwiTWFrZSBzdXJlIHlvdSBwZXJmb3JtIHRoZSBjb3IudGVzdCB3aXRoIHRoZSAnc3BlYXJtYW4nIG1ldGhvZCFcIilcbnN1Y2Nlc3NfbXNnKFwiR29vZCBqb2IhXCIpIn0=

2. The relationship between Degree of synthesis and isolation across languages

  • Greenberg (1960) calculated different indices based on a large collection of texts for different languages (the data were donated by Freek Van de Velde, KU Leuven):
    • Synthesis_Index indicator of the degree of syntheticity of a language. It expresses the ratio of morphemes to words (i.e., how many morphemes occur in a typical word). Ranges from 0 to +Inf, but values above 3 do not occur
    • Isolation_Index indicator of the number of words that are necessary to convey a relationship between words at the level of the sentence. Ranges from 0 (no words) to 1 (one word)
    • Since the two indices are on a different scale, they were transformed to z-scores
  • Our hypothesis is the following:
    • Languages high on the Synthesis index will be low on the Isolation index and vice versa
  • Our null hypothesis states:
    • There is no relationship between syntheticity and isolation in language

2.1 Loading and exploring the data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIHJlYWRyIHBhY2thZ2VcblxuIyBMb2FkIHRoZSBjb3Vyc2UgZGF0YSBmcm9tIHRoZSBjb3Vyc2Ugd2Vic2l0ZSB0byB0aGUgb2JqZWN0ICdkYXRhU2V0JzpcbiMgaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfR3JlZW5iZXJnXzE5NjAuY3N2XG5cbiMgTG9hZCB0aGUgZHBseXIgcGFja2FnZVxuXG4jIFByaW50IGEgJ2dsaW1wc2UnIG9mIHRoZSBkYXRhU2V0XG5cbiMgUHJpbnQgYSBzdW1tYXJ5IG9mIHRoZSBTeW50aGVzaXNfSW5kZXggY29sdW1uXG5cbiMgUHJpbnQgYSBzdW1tYXJ5IG9mIHRoZSBJc29sYXRpb25fSW5kZXggY29sdW1uIiwic29sdXRpb24iOiIjIExvYWQgdGhlIHJlYWRyIHBhY2thZ2VcbmxpYnJhcnkocmVhZHIpXG4jIExvYWQgdGhlIGNvdXJzZSBkYXRhIGZyb20gdGhlIGNvdXJzZSB3ZWJzaXRlIHRvIHRoZSBvYmplY3QgJ2RhdGFTZXQnOlxuIyBodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9HcmVlbmJlcmdfMTk2MC5jc3ZcbmRhdGFTZXQgPC0gcmVhZF9jc3YoXCJodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9HcmVlbmJlcmdfMTk2MC5jc3ZcIilcbiMgTG9hZCB0aGUgZHBseXIgcGFja2FnZVxubGlicmFyeShkcGx5cilcbiMgUHJpbnQgYSAnZ2xpbXBzZScgb2YgdGhlIGRhdGFTZXRcbmdsaW1wc2UoZGF0YVNldClcbiMgUHJpbnQgYSBzdW1tYXJ5IG9mIHRoZSBTeW50aGVzaXNfSW5kZXggY29sdW1uXG5zdW1tYXJ5KGRhdGFTZXQkU3ludGhlc2lzX0luZGV4KVxuIyBQcmludCBhIHN1bW1hcnkgb2YgdGhlIElzb2xhdGlvbl9JbmRleCBjb2x1bW5cbnN1bW1hcnkoZGF0YVNldCRJc29sYXRpb25fSW5kZXgpIiwic2N0IjoidGVzdF9vYmplY3QoXCJkYXRhU2V0XCIpXG50ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJyZWFkclwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAncmVhZHInIHBhY2thZ2UhXCIpXG50ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJkcGx5clwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAnZHBseXInIHBhY2thZ2UhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcImdsaW1wc2UoZGF0YVNldClcIiwgICBpbmNvcnJlY3RfbXNnID0gXCJNYWtlIHN1cmUgdG8gcHJpbnQgYSAnZ2xpbXBzZScgb2YgdGhlIGRhdGEhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcInN1bW1hcnkoZGF0YVNldCRTeW50aGVzaXNfSW5kZXgpXCIsICAgaW5jb3JyZWN0X21zZyA9IFwiTWFrZSBzdXJlIHRvIHByaW50IGEgJ3N1bW1hcnknIG9mIHRoZSBTeW50aGVzaXNfSW5kZXggY29sdW1uIVwiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJzdW1tYXJ5KGRhdGFTZXQkSXNvbGF0aW9uX0luZGV4KVwiLCAgIGluY29ycmVjdF9tc2cgPSBcIk1ha2Ugc3VyZSB0byBwcmludCBhICdzdW1tYXJ5JyBvZiB0aGUgSXNvbGF0aW9uX0luZGV4IGNvbHVtbiFcIilcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQhXCIpIn0=

2.2 A first visual inspection of the relationship

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5kYXRhU2V0IDwtIHJlYWRfY3N2KFwiaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfR3JlZW5iZXJnXzE5NjAuY3N2XCIpIiwic2FtcGxlIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBMb2FkIHRoZSBnZ3Bsb3QyIHBhY2thZ2VcblxuIyBEcmF3IGEgc2NhdHRlcnBsb3Qgb2YgIElzb2xhdGlvbl9JbmRleCAoWC1heGlzKSB2cyBTeW50aGVzaXNfSW5kZXggKHktYXhpcylcbiMgQWRkIGEgbGluZWFyIChsbSkgdHJlbmRsaW5lIFxuXG4jIFdoYXQgZG8geW91IHNlZT8gSXMgdGhlIHJlbGF0aW9uc2hpcCBtb25vdG9uaWMgYW5kIGxpbmVhcj8gWW91IHdpbGwgZmluZCB0aGUgY29ycmVjdCBhbnN3ZXIgb24gdGhlICdTb2x1dGlvbicgdGFiIiwic29sdXRpb24iOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIExvYWQgdGhlIGdncGxvdDIgcGFja2FnZVxubGlicmFyeShnZ3Bsb3QyKVxuIyBEcmF3IGEgc2NhdHRlcnBsb3Qgb2YgIElzb2xhdGlvbl9JbmRleCAoWC1heGlzKSB2cyBTeW50aGVzaXNfSW5kZXggKHktYXhpcylcbiMgQWRkIGEgbGluZWFyIChsbSkgdHJlbmRsaW5lIFxuZ2dwbG90KGRhdGFTZXQsIGFlcyh4PUlzb2xhdGlvbl9JbmRleCwgeT1TeW50aGVzaXNfSW5kZXgpKSArIFxuICBnZW9tX3BvaW50KCkgKyBcbiAgZ2VvbV9zbW9vdGgobWV0aG9kPVwibG1cIilcbiMgV2hhdCBkbyB5b3Ugc2VlPyBJcyB0aGUgcmVsYXRpb25zaGlwIG1vbm90b25pYyBhbmQgbGluZWFyPyBcbiMgVGhlIHJlbGF0aW9uc2hpcCBpcyBtb25vdG9uaWM6IFdoZW4gU3ludGhlc2lzX0luZGV4IGdvZXMgdXAsIHRoZXJlIGlzIGEgZGVjcmVhc2UgaW4gSXNvbGF0aW9uX0luZGV4XG4jIFRoZSByZWxhdGlvbnNoaXAgaXMgbGluZWFyOiBXaGVuIFN5bnRoZXNpc19JbmRleCBnb2VzIHVwIHdpdGggb25lIHVuaXQsIHRoZXJlIGlzIGEgY29uc3RhbnQgZGVjcmVhc2UgaW4gdGhlIHZhbHVlcyBvZiBJc29sYXRpb25fSW5kZXggYWNyb3NzIHRoZSBJc29sYXRpb25fSW5kZXggY29udGludXVtIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZ2dwbG90MlwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAnZ2dwbG90MicgcGFja2FnZSFcIilcbnRlc3RfZ2dwbG90KGluZGV4ID0gMSlcbnN1Y2Nlc3NfbXNnKFwiRmFudGFzdGljIHdvcmshXCIpIn0=

2.3 Testing the assumptions of Pearson’s r: bivariate normal distribution and sample size

  • The plot tells us that the relationship is linear and monotonic
  • This tells us that Pearson’s r could be a valid correlation statistic for our data
  • Let’s see if the data also satisfy its other assumptions
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5saWJyYXJ5KGVuZXJneSlcbmRhdGFTZXQgPC0gcmVhZF9jc3YoXCJodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9HcmVlbmJlcmdfMTk2MC5jc3ZcIikiLCJzYW1wbGUiOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIExvYWQgdGhlIGVuZXJneSBwYWNrYWdlXG5cbiMgUGVyZm9ybSBhIG12bm9ybS5ldGVzdCB0byBzZWUgaWYgJ1N5bnRoZXNpc19JbmRleCcgIGFuZCBJc29sYXRpb25fSW5kZXggaGF2ZSBhIGJpdmFyaWF0ZSBub3JtYWwgZGlzdHJpYnV0aW9uLiBVc2UgMTAwMCByZXBsaWNhdGVzXG5cbiMgQ29tcHV0ZSB0aGUgc2FtcGxlIHNpemUgYnkgY2FsY3VsYXRpbmcgdGhlIG51bWJlciBvZiByb3dzIGluIGRhdGFTZXRcblxuIyBXaGF0IGRvZXMgdGhlIHRlc3QgdGVsbCB5b3UsIGRvIHRoZSB0d28gY29sdW1ucyBoYXZlIGEgYml2YXJpYXRlIG5vcm1hbCBkaXN0cmlidXRpb24/IFlvdSB3aWxsIGZpbmQgdGhlIGNvcnJlY3QgYW5zd2VyIG9uIHRoZSAnU29sdXRpb24nIHRhYiIsInNvbHV0aW9uIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBMb2FkIHRoZSBlbmVyZ3kgcGFja2FnZVxubGlicmFyeShlbmVyZ3kpXG4jIFBlcmZvcm0gYSBtdm5vcm0uZXRlc3QgdG8gc2VlIGlmICdTeW50aGVzaXNfSW5kZXgnICBhbmQgSXNvbGF0aW9uX0luZGV4IGhhdmUgYSBiaXZhcmlhdGUgbm9ybWFsIGRpc3RyaWJ1dGlvbi4gVXNlIDEwMDAgcmVwbGljYXRlc1xubXZub3JtLmV0ZXN0KGRhdGFTZXRbLCBjKFwiU3ludGhlc2lzX0luZGV4XCIsIFwiSXNvbGF0aW9uX0luZGV4XCIpXSwgMTAwMClcblxuIyBDb21wdXRlIHRoZSBzYW1wbGUgc2l6ZSBieSBjYWxjdWxhdGluZyB0aGUgbnVtYmVyIG9mIHJvd3MgaW4gZGF0YVNldFxubnJvdyhkYXRhU2V0KVxuXG4jIFdoYXQgZG9lcyB0aGUgdGVzdCB0ZWxsIHlvdSwgZG8gdGhlIHR3byBjb2x1bW5zIGhhdmUgYSBiaXZhcmlhdGUgbm9ybWFsIGRpc3RyaWJ1dGlvbj9cbiMgcCA+IDAuMDUsIHNvIHRoZSBkYXRhIGZvbGxvdyBhIGJpdmFyaWF0ZSBub3JtYWwgZGlzdHJpYnV0aW9uLiBOb3RlIHRoYXQgdGhpcyBpcyBpbXBvcnRhbnQgaGVyZSwgYXMgdGhlIHNhbXBsZSBzaXplIGlzIHNvIHNtYWxsIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZW5lcmd5XCIsIFwiTWFrZSBzdXJlIHRvIGNhbGwgdGhlICdlbmVyZ3knIHBhY2thZ2UhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcIm5yb3coZGF0YVNldClcIiwgXCJEb24ndCBmb3JnZXQgdG8gY29tcHV0ZSB0aGUgc2FtcGxlIHNpemUhXCIpXG50ZXN0X291dHB1dF9jb250YWlucygnbXZub3JtLmV0ZXN0KGRhdGFTZXRbLCBjKFwiU3ludGhlc2lzX0luZGV4XCIsIFwiSXNvbGF0aW9uX0luZGV4XCIpXSwgMTAwMCknLCBcIk1ha2Ugc3VyZSB5b3UgcGVyZm9ybSB0aGUgbXZub3JtLmV0ZXN0IVwiKVxuc3VjY2Vzc19tc2coXCJHb29kIGpvYiFcIikifQ==

2.4 Testing the assumptions of Pearson’s r: homoskedastic data

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5kYXRhU2V0IDwtIHJlYWRfY3N2KFwiaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfR3JlZW5iZXJnXzE5NjAuY3N2XCIpIiwic2FtcGxlIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBMb2FkIHRoZSBjYXIgcGFja2FnZVxuXG4jIFNwZWNpZmljeSBhIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsICdtb2QnIHRoYXQgcmVncmVzc2VzIFN5bnRoZXNpc19JbmRleCBvbiBJc29sYXRpb25fSW5kZXggXG5cbiMgUGVyZm9ybSBhIG5jdlRlc3Qgb24gdGhpcyByZWdyZXNzaW9uIG1vZGVsXG5cbiMgV2hhdCBkb2VzIHRoZSB0ZXN0IHRlbGwgeW91LCBhcmUgdGhlIGRhdGEgaGV0ZXJvc2tlZGFzdGljIG9yIGhvbW9za2VkYXN0aWM/IFlvdSB3aWxsIGZpbmQgdGhlIGNvcnJlY3QgYW5zd2VyIG9uIHRoZSAnU29sdXRpb24nIHRhYiIsInNvbHV0aW9uIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBMb2FkIHRoZSBjYXIgcGFja2FnZVxubGlicmFyeShjYXIpXG4jIFNwZWNpZmljeSBhIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsICdtb2QnIHRoYXQgcmVncmVzc2VzIFN5bnRoZXNpc19JbmRleCBvbiBJc29sYXRpb25fSW5kZXggXG5tb2QgPC0gbG0oU3ludGhlc2lzX0luZGV4IH4gSXNvbGF0aW9uX0luZGV4LCBkYXRhU2V0KVxuIyBQZXJmb3JtIGEgbmN2VGVzdCBvbiB0aGlzIHJlZ3Jlc3Npb24gbW9kZWxcbm5jdlRlc3QobW9kKVxuIyBXaGF0IGRvZXMgdGhlIHRlc3QgdGVsbCB5b3UsIGFyZSB0aGUgZGF0YSBoZXRlcm9za2VkYXN0aWMgb3IgaG9tb3NrZWRhc3RpYz9cbiMgcCA+IDAuMDUgY29uZmlybXMgd2hhdCB3ZSBzYXcgb24gdGhlIHBsb3Q6IHRoZSByZWxhdGlvbnNoaXAgaXMgbGluZWFyIGFuZCB0aGUgZGF0YSBhcmUgaG9tb3NrZWRhc3RpYyIsInNjdCI6InRlc3RfbGlicmFyeV9mdW5jdGlvbihcImNhclwiLCBcIk1ha2Ugc3VyZSB0byBjYWxsIHRoZSAnY2FyJyBwYWNrYWdlIVwiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoJ25jdlRlc3QobW9kKScsIFwiTWFrZSBzdXJlIHlvdSBwZXJmb3JtIHRoZSBuY3ZUZXN0IVwiKVxuc3VjY2Vzc19tc2coXCJHcmVhdCFcIikifQ==

2.5 Testing the assumptions of Pearson’s r: independent data

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5saWJyYXJ5KGVuZXJneSlcbmRhdGFTZXQgPC0gcmVhZF9jc3YoXCJodHRwOi8vd3d3Lmplcm9lbmNsYWVzLmJlL3N0YXRpc3RpY3NfZm9yX2xpbmd1aXN0aWNzL2RhdGFzZXRzL2NsYXNzNV9HcmVlbmJlcmdfMTk2MC5jc3ZcIikiLCJzYW1wbGUiOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIExvYWQgdGhlIGNhciBwYWNrYWdlXG5cbiMgU3BlY2lmaWN5IGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgJ21vZCcgdGhhdCByZWdyZXNzZXMgU3ludGhlc2lzX0luZGV4IG9uIElzb2xhdGlvbl9JbmRleCBcblxuIyBQZXJmb3JtIGEgRHVyYmluLVdhdHNvbiB0ZXN0IG9uIHRoaXMgcmVncmVzc2lvbiBtb2RlbFxuXG4jIFdoYXQgZG9lcyB0aGUgdGVzdCB0ZWxsIHlvdSwgYXJlIHRoZSBkYXRhIGF1dG9jb3JyZWxhdGVkPyBZb3Ugd2lsbCBmaW5kIHRoZSBjb3JyZWN0IGFuc3dlciBvbiB0aGUgU29sdXRpb24gdGFiIiwic29sdXRpb24iOiIjIFRoZSBkYXRhLmZyYW1lIGRhdGFTZXQgaXMgYWxyZWFkeSBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIExvYWQgdGhlIGNhciBwYWNrYWdlXG5saWJyYXJ5KGNhcilcbiMgU3BlY2lmaWN5IGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgJ21vZCcgdGhhdCByZWdyZXNzZXMgU3ludGhlc2lzX0luZGV4IG9uIElzb2xhdGlvbl9JbmRleCBcbm1vZCA8LSBsbShTeW50aGVzaXNfSW5kZXggfiBJc29sYXRpb25fSW5kZXgsIGRhdGFTZXQpXG4jIFBlcmZvcm0gYSBuY3ZUZXN0IG9uIHRoaXMgcmVncmVzc2lvbiBtb2RlbFxuZHVyYmluV2F0c29uVGVzdChtb2QpXG4jIFdoYXQgZG9lcyB0aGUgdGVzdCB0ZWxsIHlvdSwgYXJlIHRoZSBkYXRhIGF1dG9jb3JyZWxhdGVkP1xuIyBwID4gMC4wNSB0ZWxscyB1cyB0aGF0IGF1dG9jb3JyZWxhdGlvbiBpcyBub3QgYW4gaXNzdWUiLCJzY3QiOiJ0ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJjYXJcIiwgXCJNYWtlIHN1cmUgdG8gY2FsbCB0aGUgJ2NhcicgcGFja2FnZSFcIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKCdkdXJiaW5XYXRzb25UZXN0KG1vZCknLCBcIk1ha2Ugc3VyZSB5b3UgcGVyZm9ybSB0aGUgZHVyYmluV2F0c29uVGVzdCFcIilcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQhXCIpIn0=

2.6 Performing the correlations test

  • All of our previous exercises point in the same direction: the data are perfectly suited for the Pearson’s correlation test:
    • The relationship is linear and monotonic
    • The relationship is homoskedastic
    • The data follow a bivariate normal distribution (this is important, because the sample size is small)
  • We can use Pearson’s r
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkocmVhZHIpXG5kYXRhU2V0IDwtIHJlYWRfY3N2KFwiaHR0cDovL3d3dy5qZXJvZW5jbGFlcy5iZS9zdGF0aXN0aWNzX2Zvcl9saW5ndWlzdGljcy9kYXRhc2V0cy9jbGFzczVfR3JlZW5iZXJnXzE5NjAuY3N2XCIpIiwic2FtcGxlIjoiIyBUaGUgZGF0YS5mcmFtZSBkYXRhU2V0IGlzIGFscmVhZHkgaW4geW91ciB3b3Jrc3BhY2VcblxuIyBQZXJmb3JtIGEgY29yLnRlc3Qgd2l0aCB0aGUgJ3BlYXJzb24nIG1ldGhvZC4gUmVjYWxsIHRoYXQgb3VyIGh5cG90aGVzaXMgaXMgZGlyZWN0aW9uYWw6IGl0IHByZWRpY3RzIGEgbmVnYXRpdmUgY29ycmVsYXRpb25cblxuIyBDYWxjdWxhdGUgdGhlIHItc3F1YXJlZCB2YWx1ZSBvZiB0aGUgcmVsYXRpb25zaGlwLiBIb3cgbXVjaCBvZiB0aGUgdmFyaWFuY2Ugb2YgIElzb2xhdGlvbl9JbmRleCBpcyBleHBsYWluZWQgYnkgU3ludGhlc2lzX0luZGV4P1xuXG4jIFdoYXQgZG8gdGhlIHRlc3RzIHRlbGwgeW91OiBcbiMgLSBJcyB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gdGhlIHR3byB2YXJpYWJsZXMgc3Ryb25nPyBcbiMgLSBJcyB0aGUgcmVsYXRpb25zaGlwIHBvc2l0aXZlIG9yIG5lZ2F0aXZlPyBcbiMgLSBJcyBpdCBzaWduaWZpY2FudD8gXG4jIC0gSG93IG11Y2ggb2YgdGhlIHZhcmlhbmNlIG9mICBJc29sYXRpb25fSW5kZXggaXMgZXhwbGFpbmVkIGJ5IFN5bnRoZXNpc19JbmRleD9cbiMgWW91IHdpbGwgZmluZCB0aGUgY29ycmVjdCBhbnN3ZXJzIG9uIHRoZSBTb2x1dGlvbiB0YWIiLCJzb2x1dGlvbiI6IiMgVGhlIGRhdGEuZnJhbWUgZGF0YVNldCBpcyBhbHJlYWR5IGluIHlvdXIgd29ya3NwYWNlXG5cbiMgUGVyZm9ybSBhIGNvci50ZXN0IHdpdGggdGhlICdwZWFyc29uJyBtZXRob2QuIFJlY2FsbCB0aGF0IG91ciBoeXBvdGhlc2lzIGlzIGRpcmVjdGlvbmFsOiBpdCBwcmVkaWN0cyBhIG5lZ2F0aXZlIGNvcnJlbGF0aW9uXG5jb3IudGVzdChkYXRhU2V0JFN5bnRoZXNpc19JbmRleCwgZGF0YVNldCRJc29sYXRpb25fSW5kZXgsIGFsdGVybmF0aXZlPVwibGVzc1wiLCBtZXRob2Q9XCJwZWFyc29uXCIpXG5cbiMgQ2FsY3VsYXRlIHRoZSByLXNxdWFyZWQgdmFsdWUgb2YgdGhlIHJlbGF0aW9uc2hpcC4gXG5jb3IoZGF0YVNldCRTeW50aGVzaXNfSW5kZXgsIGRhdGFTZXQkSXNvbGF0aW9uX0luZGV4KV4yXG5cbiMgV2hhdCBkbyB0aGUgdGVzdHMgdGVsbCB5b3U6IFxuIyAtIElzIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcyBzdHJvbmc/IFxuIyBUaGUgcmVsYXRpb25zaGlwIGlzIHZlcnkgc3Ryb25nXG4jIC0gSXMgdGhlIHJlbGF0aW9uc2hpcCBwb3NpdGl2ZSBvciBuZWdhdGl2ZT8gXG4jIFRoZSByZWxhdGlvbnNoaXAgaXMgbmVnYXRpdmUsIHRoZSBjb2VmZmljaWVudCBpcyBzbWFsbGVyIHRoYW4gemVyb1xuIyAtIElzIGl0IHNpZ25pZmljYW50PyBcbiMgcCA8IDAuMDUgc28gd2UgY2FuIGNvbnNpZGVyIHRoZSByZWxhdGlvbnNoaXAgdG8gYmUgc3Ryb25nIEFORCBzaWduaWZpY2FudFxuIyAtIEhvdyBtdWNoIG9mIHRoZSB2YXJpYW5jZSBvZiAgSXNvbGF0aW9uX0luZGV4IGlzIGV4cGxhaW5lZCBieSBTeW50aGVzaXNfSW5kZXg/XG4jIFRoZSBTeW50aGVzaXNfSW5kZXggZXhwbGFpbnMgYWJvdXQgNzElIG9mIHRoZSB2YXJpYW5jZSBvZiBJc29sYXRpb25fSW5kZXgiLCJzY3QiOiJ0ZXN0X291dHB1dF9jb250YWlucygnY29yLnRlc3QoZGF0YVNldCRTeW50aGVzaXNfSW5kZXgsIGRhdGFTZXQkSXNvbGF0aW9uX0luZGV4LCBhbHRlcm5hdGl2ZT1cImxlc3NcIiwgbWV0aG9kPVwicGVhcnNvblwiKScsIFwiTWFrZSBzdXJlIHlvdSBwZXJmb3JtIHRoZSBjb3IudGVzdCB3aXRoIHRoZSAncGVhcnNvbicgbWV0aG9kIVwiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoJ2NvcihkYXRhU2V0JFN5bnRoZXNpc19JbmRleCwgZGF0YVNldCRJc29sYXRpb25fSW5kZXgpXjInLCBcIkRvbid0IGZvcmdldCB0byBjYWxjdWxhdGUgdGhlIHItc3F1YXJlZCB2YWx1ZSFcIilcbnN1Y2Nlc3NfbXNnKFwiR29vZCB3b3JrIVwiKSJ9

References

  • Greenberg, J. (1960). A quantitative approach to the morphological typology of language. International Journal of American Linguistics 26(3). 178-194.
  • Kuperman, V., Stadthagen-Gonzalez, H., Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words.Behavior Research Methods 44(4).978–990.

Acknowledgements

A warm thank you! goes out to Freek Van de Velde (KU Leuven), who generously donated the Greenberg (1960) data for the second exercise

© 2018 Jeroen Claes