You are on page 1of 44

Semester 1 notes

NOTES SEMESTER 1
FIRST DAY (1st Oct 2010)
Need to average 50%+ for all units. 40% for each unit to pass. 40 hours per week. Class interests: Huw Davies Sociology then analyst programmer then teacher. Paul economics, environmental, then web applications developer. Lisa law, criminology, prescription drug misuse. Jack Physicist, then BP, marketing, trading. Jaymie Law, politics, then MOD Strategic Analyst. Katie IT and Organisations Olivier Sociology and Psychology, New Technologies and WOW. Richard G. ECS student, Privacy, Social networks. Huw Law then Computer Science then Web Developer for Betting Company. James ECS. Will ECS, copyright. Chris ECS. Chris Management, then Project Development. Craig Psychology. Peter Marketing and Computing. Terhi Archaeology, and Museums Cuniform and the Semantic Web. PhD students:

Oct 2010 Jan 2011

Ramain ECS. How applications are designed, how users needs are perceived. Design of apps. Sarosh Law. Blogs implications of access to data in Employment Law. Simon Elec.Engineering. AI. Web Behaviours, how they manifest at a low level. Page 1

Semester 1 notes
Connor Micro/Macro Scale. Modelling Social Network System interactions. Trust. Tom ECS. Politics and Law, Linked data and the law. Laura Law. Open Access to academic work online. Russell IT and Organisations. Internet Law. Useful emails Csmscws-all@ecs.soton.ac.uk Students las@ecs.soton.ac.uk Leslie Carr mjw@ecs.soton.ac.uk Mark Weal cjp@soton.ac.uk Catherine Pope

Oct 2010 Jan 2011

Page 2

Semester 1 notes
COMP6037 Foundations of Web Science
4/10/10 Assignments: Book Review 10% Wiki (collaborative) 90% DO a 500 word book review and a group presentation.

Oct 2010 Jan 2011

Look at how the book you are assigned contributes to the web what does it bring to the Web? 6/10/10 15/10 Book Review handin date 18/10 and 20/10 Book Review Group Presentations Handin.ecs.soton.ac.uk where all the assignments are handed in. LAW: Prcis reducing longer text into a shorter form. I.e. Daily Mail article says, Prince Charles model Perfect English Village, Poundbury, halfway through its development on 1200 houses, purporting to be pedestrian friendly, has been criticised by the Ramblers Association as being an urban ghetto, restricting access to the countryside. Forcing access to the village by car or along a busy main road, with added criticism that houses were badly built and layout encouraged crime. Princes spokesman says access is this way because it minimises the impact on livestock. Key Actors. Take out figures. But what about the impact? How can you judge this without figures? KEEP the Tone. Work with Craig, Jack and Phil to write a prcis for the IVF Daily Mail article, and then summarise the arguments made in the article. Nicole Prcis and Conclusion, Phil Summary, Jack Against, Craig For. 7/10/10 Gave presentations of articles from last session.

11/10/10 Tim Berners-Lee TED lecture. Talks about Grass Roots Movement. Difference between putting DOCUMENTS on the web and DATA on the web. Page 3

Semester 1 notes
Documents Web Now You cant use data by itself TBL RULES OF LINKE DATA: 1 HTTP for all conceptual things 2 HTTP will fetch data in a standard format 3 The data will have relationships the things it has a relationship with will be an HTTP Dbpedia Linked Data BUT problematic data Gov, enterprise database hugging TBL i.e. curing cancer, understanding economies, Alzheimers (example of linked data with two disciplines data crossing over). Data is about our lives. TBL Data Web Future

Oct 2010 Jan 2011

Social Network Systems repurposing your data. The SNSs dont link together Walled Gardens. e.g of Linked Data openstreetmap.org Raw Data Now TBL CRITICISING TBL: Web Science is: collaborative, empowering, only talks about the good things, sales pitch, subjective. Need idealistic goal for web science to move onwards. But releasing data about ourselves is very different from TBL releasing his internet protocols. Disciplines joining up. You design things the way you would imagine them to be used. The User is an engineering issue. Is web science about?: A Building Something B Understanding the Web How do you define what data should be shared? defining purpose of the data. Personal data at what point does the data become not private? i.e. 192.com is just the electoral roll in a different format. Page 4

Semester 1 notes
Look at Nigel Shadbolts talk on EdShare compare his approach to TBLs What is Web Science? How do the different evangelists approach this? 12/10/10 Web Graph les Carr Difficult to represent websites as networked map (a local map). -less control over editorial content - hierarchy Now everything is linked together emerging networks as a global infrastructure. Human Behaviour (choosing things to link to) The resulting shapes Nodes / vertices pages, documents Edges connect nodes together links, or arcs. SIZE OF A NETWORK = NUMBER OF VERTICES (expressed as N) Diameter of a network (i.e. how many links do you need to follow to get from one to another?) Average case diameter

Oct 2010 Jan 2011

Longest shortest path between any two nodes SMALL WORLD THEORY average shortest distance between vertex and vertex (between all pairs of vertices). Number of edges connected to vertex. Ratio of edges to vertices.

Degree of vertex Density of network

Web is a DIRECT GRAPH. in degree (i.e. links going in/out) + out degree. DEGREE DISTRIBUTION: The Web:

Page 5

Semester 1 notes

Oct 2010 Jan 2011

Power Law distribution no meaningful average. V.low and high degrees likely - but high degrees likely in aggregate. Normal distribution v.high and low degrees unlikely. Popular nodes have millions of links. Therefore network appears to have no scale.

THE WEB:

Page 6

Semester 1 notes

Oct 2010 Jan 2011

Bow tie diagram from 2000. How would you browse the web without a search engine? Google uses Page Rank Algorithm. Barabasi and Kleinberg. Why do we have a bow tie? PREFERENTIAL ATTACHMENT can lead to POWER LAW DISTRIBUTION. i.e. makes a new node, then link it to other nodes preferentially. Youre more likely to link to things according to how many links they already have a bigger site will be linked to more as you know about them. SCC 28 diameter. Whole bow tie graph is 500 in diameter. Probability of a path between randomly chosen pairs 24%. From that: Average directed path length = 16 Average undirected path length = 6 SOFTWARE NETLOGO to simulate maths models. 14/10/10 Catherine Pope The Web Social Gauntlett, 2004. Web Studies. Page 7

Semester 1 notes
Schools of thought: Web Studies and Sociology and the Internet Melissa Gilbert works on digital divide in the US. SOCIAL CAPITAL internet access can lead to improvement in life (i.e. networks). DeMaggio Social Implications of the Internet. Also OII papers. Is there a distinct sociology of the web? Sociology Micro and Macro | Qualitative and Quantitative Methods

Oct 2010 Jan 2011

Are interactions on the web the same as in the real world? Or is something unique happening? THEMES to a sociology of the web: Power the dark side of the web. Deviance. Identity Individuals, Interaction. Institutions and Power. Societal Structures Class Equality Politics Work and Organisations So: Sociology of Technology / Knowledge: Approaches: 1 technological determinism determined by non-social laws, is it progress / positive? So technology causes social change. TBL? Inevitability of Tech. Det. Causal Powers of Tech. Det. Susan Greenfield: Facebook 2 social construction of technology Shaping economical, political, cultural, social. Capitalism (Google censoring in China), military, responses (Galileo and the Church), affordances/viability. MacKenzie and Wajcman, 2007. The Social Shaping of Technology. Black boxing. And need to understand social groups and interpretative flexibility. 3 critique of social determinism

Page 8

Semester 1 notes

Oct 2010 Jan 2011

Pinch and Bijker, 1984. Social Construction of Facts and Artefacts. Social Studies of Science, 14: 399441 Oppenhiemer defence (I just made the bombs). Technological is about choices. Science informing Technology Science = Discovery. Politics, funding, paradigms, SHAPES science SHAPES technology? ACTOR NETWORK THEORY: Susan Halford Bruno Latour, 2005. Reassembling the Social

Technology and Social Mutually Constitutive? Donna Haraway feminist. Cyborgs: Machine and Organism 1991. Simians, Cyborgs and Women. Lucy Suchman anthropology, ethnography. Human Computer Interaction (HCI). 2007. Human-Machine Reconfigurations. 18/10/10 Book Review Presentations: Clay Shirky, Here Comes Everyone. Lisa. Overview. In line with disciplinary approaches. Lots of case studies. Adaption of tools (SNS) No bad examples all successful Element of luck Quite repetitive.

Chris Anderson, The Long Tail. Jack, Paul. The long tail challenges to economic theory and practice. Why is there a long tail?: o Democratised production personal computers o Democratised distribution web o Finding stuff youd like recommendations o Promotes niche cultures WOW and group formations o Can escape tyranny of locality.

Jonathon Zittrain, The Future of the Internet. Jaymie, Terhi. Generative nature of products, i.e. easy to adapt, etc. also leads to bad stuff (viruses). Page 9

Semester 1 notes
-

Oct 2010 Jan 2011

Closed, non-generative platform future. Alternative futures shapable internet/web. Use examples of Wikipedia generative. Even works with bad comments, etc. Heardit designed by the author. Telling other users about viruses on their pages. Allowing tools and community to manage content of the web. Passive users ignored. Difficult to place where the book sits. airport reading or academic ? But his ideas are v. good. Re. managing a community online i.e. the Wikipedia approach to the whole web idea. Remove reasons why, and look at the actual suggestions its a very useful book. Generative is something that can be modified not finished.

20/10/10 Book Review Presentations contd. The Spy in the Coffee Shop. Huw, Hu, Olivier The human in the loop is the weakness of any privacy system.

Chris Anderson. Free: The Future of a Radical Price Follow up to the Long Tail. Labour exchange search page ranking on Google improves target algorithm. The idea of being locked into free is this true?

Groundswell. Huw, Chris, Chris How businesses can better manage their online SNS mentions. i.e. brand ambassadors; presence on SNSs; customer involvement in development of products. Groundswell is a result of changes in society. How are people using the web? And how can businesses best make use of their uses? Is the groundswell a purely online phenomenon? Notion of social legitimacy (Chris).

21/10/10 Did graphs for this lecture. Used NWT. This lecture made no sense at all. 25/10/10 Assignment Wiki 4 components of coursework: 1. Collaborative research 2. Individual summary Page 10

Semester 1 notes
3. Peer review and editing 4. Revision and collation Deadline next year. 200 hours 20 credits, = c.5,000 words. Using MediaWIKI. You have to sign your contributions for ease of reference. Components of the wiki: 1. 2. 3. 4. 5. Overview what is Web Science? 1,000 words Web Science Conference 1,000 words External disciplines (take two) 1,000 words Technical graphs, web, social networks, etc. 1,000 words Shared area 1,000 words.

Oct 2010 Jan 2011

For external disciplines why is this discipline significant? And take a topic from the discipline to discuss. For Web Science Conference looking at previous WebSci conferences, finding the forming communities and threads through a few papers. The Wiki is an overall exploration of Web Science. We need to prove that we can identify important themes coming out of Web Science. The wiki should take the same format 1. Overview; 2. Technical; 3. WebSci Conference; 4. Shared Area.

Page 11

Semester 1 notes
COMP6046 Computational Thinking
4/10/10 J.Wing, 2006. Computational Thinking, ACM, v.49, n.3 Core text: J. Glenn Brookshear, Computer Science: An Overview. get this book. Assignments: 1 Journalistic Article 30% - 1500 words 2 HE/FE school teaching activity 20% - group work

Oct 2010 Jan 2011

3 Public engagement lecture 50% - group work this will end with an evening in December with external visitors. 7/10/10 Les Carr Computer Science, WTF? ABSTRACTION as a tool to controlling complexity. ALGORITHMS often abstract. Rules by which a process is carried out (embodied as a program). TRANSISTOR amplifier/switch NAND AND or NOT XOR like binary logic values 2 transistors = a logic gate. 125 million transistors in a chip. Collector Base Emitter 11/10/10 Hugh Davis. TIMELINES Look at the del.icio.us tags for this subject (in the unit homepage on ECS intranet). timeline soton COMP6046. Looking at Illinois University project to put together learning technologies timeline. Web Timeline (with Jaymie and Phil): 1 Geocities 1994 2 W3 Consortium 1994 3 Open Source Crowdsourcing Page 12

Semester 1 notes
4 Sex.com being sold 5 Napster peer to peer 1999 Overall timeline with class:

Oct 2010 Jan 2011

1968 Engelbart demonstrates ARPANET the mother of all demos. Collaborative working and video-voice conferencing. You can find this on YouTube (also in ELGG). 14/10/10 Web Future of Computation Les Carr Technological Determinism linked to Moores Law? Human computation Steve Fossett search on Mechanical Turk Amazon looking at photos to find the plane. Crowd sourcing Galaxy Zoo; V&A image database; ESP Game (gwap); reCaptcha; inputting MPs expenses for journalists.

CYBORG COMPUTING:

SELECTION - who? Human / computer activity. i.e. where is the judgement? CALCULATION Who is doing this? For whom? Human / computer ORGANISATION who? Web most of the time

Re-evaluation of community effects online. role of humans: mechanisms; motivations. i.e. show results and stats?

18/10/10 Grady Booch video. The Promise, The Limits, The Beauty of Software. Yahoo Video. Ppt. http://edshare.soton.ac.uk/5847 Page 13

Semester 1 notes
WHAT ARE THE LIMITS OF SOFTWARE? What can and cant it do? CAN: CANT: Replace judgement Create knowledge. KNOWLEDGE / INFORMATION / DATA All different. However the application of knowledge = data production. KNOWLEDGE Understanding; demonstration of; actionable information; APPLICATION OF Need to take an umbrella INFORMATION Limitations/errors; description; interpretation; CONTEXTUALISED 3.25cm of rain will fall today Amplify human intelligence Analyse knowledge Classify Define

Oct 2010 Jan 2011

DATA Quantity; numbers; RAW

3.25

When Google looks for statements it outputs data not knowledge. i.e. Q what is the capital city of France? A Paris is the capital of France. Transforming the data, but not knowledge. Someone has said this once in Google, Google hasnt understood what youve asked. Software is invisible. intellectually complex artefact. Booch. Problems of design the more complex the requirements, the more likely It is that the design will fail. 21/10/10 Grady Booch Keynote again. My notes from the video: Practical limits to what we want to do. And moral and ethical limits to what we want to build. Page 14

Semester 1 notes

Oct 2010 Jan 2011

Computer museum is now collecting softwrare, including source code. How do you curate that? Complex nature of the artefacts. Issues with preservation of history with email and other software intensive mechanisms. And new forms of artistic expression (i.e. music industry). Effects of the Laws of Physics on software. The architetcures of companied are accidental. Continuously evolving software systems as we cant turn them off at any point. Scale of software. A problem. Yahoo 10million+ lines of code. Represents your legacy. A capital investment. How can you preserve intellectual design decisions for the future? Predictions: now software transparent/hidden. 2020 software unavoidable. 2030 rise of the machines. Booch Keynote discussion from class: Legacy of software. Software is fragile. Hidden dependencies. Words like; history, excavations, archive. Lots of lines of code. Anecdotes. i.e. Turing. Software unavoidable cnt turn it off. Software is in the interstitial spaces (the spaces between cells) embedded? Limits ot software physical heat; energy used; controlling individual tiny components getting harder as they get smaller. Turing programming Church of Turing about how regardless of your approach, you will end up with a computer/or some form of programming. i.e. Fedex sending out stuff across the world. The problem looking fo an optimal way to do what they do. Solution the bigger the pborlem, the more time you need to do the calculations. Boock says of this problem There arent enough seconds in the Universe. Called - The HALTING PROBLEM.

Page 15

Semester 1 notes

Oct 2010 Jan 2011

Another thought with a monkey typing randomly, how long would it take him to type a sentence: my name is Defining the problem and the solution is easy. But the limits are the time that you would need for this to happen. It will take too long. The Turing Test becomes a moot question: Can machines think? Put a computer in one room and a human in another. Ask them questions. Can you tell which is which? Turing changed the question to Are machines able to pass this test? It doesnt matter if you cant tell.. Problems of designing software within the limits of software and society. Look up Bob Daupher Philosophy of Artificial Intelligence. 25/10/10 Nick Gibbins Algorithms. Searching Exhaustive search Unsorted deck of cards Interpolation search sorted deck of cards N cards in a deck. N/2 need to be searched. Interpolation search need knowledge to know how to find the target. technique binary search. i.e. go right to the middle of the data. If target is after or before that number, discount everything after/before. Then split in the middle again. And again. The logarithm will be how many steps it will be to get to the answer: Log2n steps. So if there are 32 cards in a deck, log232=5 steps. SORTING: Bubble sort (quadratic solutions) Selection sort (quadratic solutions) Insertion sort (quadratic solutions) Merge sort (log linear) Quick sort (log linear) SLOWEST FASTEST Page 16

Semester 1 notes
Shuffle sort takes ages. too long.

Oct 2010 Jan 2011

Radix sort Herman Hollerith 1887 relies on being able to select (=compare) many cards at once. Encoding data in a machine processable form. Cf. Alan K. Dewdeys Spaghetti Sort. Binary Search, Quick Sort and Merge Sort recursive algoithms. Break problem into sub-problems. Big-O Notation: O(n) complexity or order n Order of magnitude of Complexity need to be considered. i.e. when looking at an algorithm, n and also looking at n2, both will take the same amount of time, even though one is twice the amount. Whereas n2 will take less than n. Worst case complexity. Not as useful as Average Case Complexity.

Page 17

Semester 1 notes
COMP6044 Independent Disciplinary Review
5/10/10 Assignment: Report on 2 disciplines their relevance to an issue. i.e. What do sociology and politics have to say about privacy?

Oct 2010 Jan 2011

Blog weekly readings. a discipline representative will comment on the work in the blog once. Outputs: understanding of methods, approaches, tools, epistemology, underpinning concepts of two disciplines. Applying them to an issue in Web Science. 2500 word report. impact of x and y on z. Decide disciplines and issue in the first two weeks. Class brainstorm for issues: Democratic Process Censorship Capital Value Identity Autonomy Rewiring Brains IPR Copyright inc. Theft Privacy Monetisation Revenue Cognition 12/10/10 Allocated topic next week. Assignment is going to be 2500 words in total deadline will be week 12 or week 11. 19/10/10 Literature Review: Page 18

Semester 1 notes

Oct 2010 Jan 2011

MY literature review - Sociology of Identity / Biology of Identity. In particular Gender. Maybe online virtual communities? ONTOLOGIES different countries / cultures define males and females differently. Les Carr has recently read a paper about this. Dont read the research/journal literature. Its all about standard practice. Are there any things we can take from this for our dissertations? 2500 word report, + 8-10 minute presentation. Format of report: Discipline 1 Context Main areas / theories / studies of relevance to the issue

Discipline 2 Context Main areas / theories / studies of relevance to the issue

Understand issues simultaneously from these two perspectives. Synthesise them? Their different methods. i.e. limitations / benefits / possibilities. Not picking apart the issue itself. To begin the exercise identify some core readings. Les has a list of 09/10 students readings. TASK Going to use the current ECS blog. Introduce subject in the ECS blog why are you interested in studying this topic and outline the disciplines you are going to use.

Page 19

Semester 1 notes
COMP6045 / COMP3016 Hypertext and Web Technologies
5/10/10 The World Brain H.G. Wells, 1930s microfiche. Week 1 intro Weeks 2,3,4 Standards for Web Content (XML) Weeks 5,6,7 Research behind the Web (and the future) Weeks 8,9,10 Issues around the Web Then Nick Gibbins for Research. Assignment: Exam 70% Coursework on Weeks 5,6,7 30% DO exercises in groups. DONT bother with core resources.

Oct 2010 Jan 2011

Class brainstorm: When you are looking at a webpage, what is happening to make this possible? IP internet protocol sends data in packets TCP Transmission Control Protocol transmission to a place URI / URL (subset of URI) Universal Resource Identifier / Locator (U was Uniform for a while) DNS Domain Name SSL Secure Socket HTML / XML / XHTML Browser IE, Mozilla, Chrome, Opera, Webkit, Lynx HTTP CSS Cascading Style Sheet JavaScript controls activities on the browser PHP / MySQL SPRQL Flash adverts, videos ActionScript PDF Portable Document Format Page 20

Semester 1 notes
GIF, TIFF, JPG, EPS, PNG DOM Data structure FLV, AVI, OGG, MP4, MP3 Hardware Server Apache, IIS, TomCat, GlassFish UTF8 Unicode i.e. other characters TYPES OF SITES: Content, multimedia, social networking, shopping, search engines, media, etc. Before the Web: Command line messaging. FTP File transfer protocol commands, using modems, CYBERWEB. PostScript data, prints after sending. Downloading but not browsing. Then HTML a way to structure content on the screen. NCSA Browser that INCLUDED pictures. Browsing was invented. Digital photography was coming in, and this fed into it. USP of the Web = USER EXPERIENCE.

Oct 2010 Jan 2011

7/10/10 Content negotiation. form of resources Browser based on locale (language / format). HTTP message lines: Request/Status Message Header Blank Message Body Caveat some resources will have separate URIs (I.e. the PDF). CLIENT (browser) HTML Request Response SERVER With URI

HTTP

Page 21

Semester 1 notes

Oct 2010 Jan 2011

Scheme name gap host port absolute path http: // users.ecs.soton.ac.uk : 80 / index.html
HTTP / 1.1 uses GET request. Plus Host: telnet users-ecs.soton.ac.uk 80 GET /lac/test.html HTTP/1.1 Host: users.ecs.soton.ac.uk

telnet google.com 80 GET /search?q=term HTTP/1.0 Host: sparrow.ecs.soton.ac.uk

GET the resource HEAD just the file information POST forms The Interaction. STATUS CODES (bold are the ones that you need to know): 200 Success 201 Created 300 Redirection 301 Moved permanently 302 Moved temporarily 303 See other moved from POST to GET Client error: 400 Bad request 403 Forbidden Page 22

Semester 1 notes
404 File not found 500 Server error Web Architecture (key components): 1 IDENTIFICATION 2 INTERACTION 3 FORMATS PRINCIPLES OF THE WEB: A all entities must have URI B URIs are de-referenceable C Data must be in a standard format D data must be interlinked with other data 11/10/10 Les away. Terrible lecture. HTML SGML XML BASED ON: SGML DSSL HyTime HyTime

Oct 2010 Jan 2011

Construction Formatting Linking Presentation Description

XML XSL XLink SML RDF Transforms to

XML Transport data

HTML Represent Data

TREE:

Page 23

Semester 1 notes

Oct 2010 Jan 2011

Parent ELEMENTS FORMED OF NODES ELEMENT

Ancestor

Child ELEMENT ENTITY

Descendants

Child ELEMENT ENTITY

Child

Siblings

ELEMENT ENTITY

A tree can have lots of components. ATTRIBUTE I am a label of an element. DTD constrains elements. Provides the grammar. Remember issues of element recognition i.e. characters need to be encoded: < Reserved attributes, like: xml:lang xml:space xml:units Where xml: is the namespace. Declaration (Entity) links to a DTD that defined elements/attributes separately. I.e. <! ENTITY lecturer Dr.Carr> Character data sections: <! CDATA [if ]]> Comment: <! -- > is instead &lt;

Page 24

Semester 1 notes
Processing instruction: <? ?> Remember to validate http://validator.w3c.org http://edshare.soton.ac.uk/383 for the exercises for this week. 14/10/10 Web Architecture: 1 identification 2 interaction 3 format HTML in browsers LEARN THIS FOR THE EXAM. HTML needs to be standards compliant; Well formed (this happened gradually). How do we use SGML to validate HTML? SGML for printing. Not entirely relevant. Then there was XML Validation of XML document:

Oct 2010 Jan 2011

1 Well-formed i.e. tags are in the right place. 1) XML syntax OK; 2) built in character entities only. 2 Validated i.e. 1) uses pre-declared entities; 2) specified by DTD or Schema; 3) conforms to grammar of specific datatype.

DTD defines: Ordering Repeatability Labelling Vocab / schema / ontology

DTD defined in an external entity: overridden in a local document. DTD: <!ELEMENT tag word (# format)> Sequences: i.e. <! ELEMENT person (name, age, address)> The comma represents AND. So in the XML:

Page 25

Semester 1 notes
<person> <name/> <age/> <address/> </person> Other options: Person (name age address) Element Repetition: , ? + * () AND OR OPTIONAL REQUIRED AND REPEATABLE OPTIONAL AND REPEATABLE EXTRA PARANTHESES WILL GROUP

Oct 2010 Jan 2011

<! ELEMENT image EMPTY> - no content <! ELEMENT buffer ANY> - any content Attributes within Element: Attribute declarations: <! ATTLIST para another CDATA id ID #implied

#required #fixed #string #token>

Entities: Strings / external files / binary data formats. <! ENTITY nes Nicole Smith> <! ENTITY pic1 SYSTEM logo.gif NDATA gif> Entities in an XML: <! DOCTYPE Book [<! ENTITY chap1 SYSTEM ch1.xml>]> Output pulls in the chapters from the XML documents: <book> &chap1; &chap2</book> Page 26

Semester 1 notes
http://edshare.soton.ac.uk/385 for the slides and exercises for validating with DTD. DTD VALIDATOR - http://dtd-validator.ecs.soton.ac.uk 18/10/10 DOM Programming Document Object Model. DOM standardises what you see of the XML data. Tree: Nodes Elements text; entities Attributes processing instructions.

Oct 2010 Jan 2011

Dont need to do DOM programming for the exam for this unit. XPath and XSLT alternatives. NODE is an API: 1 type 2 name 3 value

Nodes are arranged in Node Lists and Named Node Maps. http://users.ecs.soton.ac.uk/lac/xml/dom.html <-- demo for DOM. And: http://users.ecs.soton.ac.uk/lac/xml/checklinks.html XPath: Expressions /book/chapter/title --/--/-- elements. Title element, inside chapter element, inside book element. /book/*/title so title element can be in ANY element, inside book element. /book//title so title element can be anywhere, as long as its inside book element. para/quote so quote element inside para element inside the current element. There is no slash at the beginning. titleheadinglabel either title or heading or label. chapter[title] chapter with title element. chapter[title=Chapter 1] chapter with title element, and name of chapter. chapter[1] first chapter element.

Page 27

Semester 1 notes
para[@security=classified] attribute value in an element. XPath: Good Simple & expressive Both docs and data 21/10/10 XSLT In the header the processing instruction; <?xml-stylesheet type=text/css href=book.css?> XSLT XML with CSS. CSS: Selector: TITLE From the DOM. Style Definitions: {font-style:bold;} Bad only works in particular contexts in conjunction with DOM or XSLT

Oct 2010 Jan 2011

Types of selectors element name; list of elements; context (i.e. the authors name, title). CSS is a box model: MARGIN BORDER PADDING CONTENT

CSS: Good: Simple Good for docs Bad: made to change bad for data Page 28

Semester 1 notes
XSLT: A HTML complex semantics. Theyre all mixed together: 1. 2. 3. 4. Data semantics Presentation semantics User interface semantics Behavioural semantics

Oct 2010 Jan 2011

B XML only have data semantics. Processing the XML therefore: 1. DTD formatting objects 2. XSL(T) language formatting semantics. Language to transform the XML data. The XSLT transforms. This is the difference between XSL and CSS. XSL uses FO. Crap standard. So most actually use CSS instead. XSL makes another XML document with the transformed data in it. This is the XML that uses the formatting vocabulary. How a stylesheet works: Mydoc.xml

Web page

Stlye.xsl

XSLT Processor Mydoc.dtd XSL Processor

FOdoc.xml

FO.dtd

ALTERNATIVE TO FO.xml and FD.dtd Mydoc.html

XSLT Stylesheet: Templates listed. Matches elements in the XML document. Page 29

Semester 1 notes
Then specifies the new content to replace the element.

Oct 2010 Jan 2011

<stylesheet> <template match=rule> <hr/> </template> <template match-swearword> <b>*&?*!</b> </template> Above: the rule is replaced with a horizontal rule its XHTML. NAMESPACES: In the XSLT, NAMESPACES explain which is XHTML and which is XSL. URL associating the elements with names. So you need to add into the header: <xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform/> <xsl:template match=rule> <hr/> </xsl:template> </xsl:stylesheet> Then the output in the header too: <xsl: etc> <xsl:output method=html/> Recursively using templates is also possible: <p> <apply-templates/> </p> <apply-elements/> makes the children of the element by processed. And you can pull in XPath. <xsl:for-each select=XPath expr/>

Page 30

Semester 1 notes

Oct 2010 Jan 2011

Page 31

Semester 1 notes
COMP6049 Qualitative and Quantitative Research Methods
5/10/10 Assignment: Annotated Bibliography 30% - set week 3, due week 7. Research Plan 70% - set week 5, due week 11. Module finishes before Christmas. No lessons in the weeks back after Christmas. 7/10 11-12 6 /1081 11/10 12-1 2/1039 14/10 11-1 6/1081 19/10 4-5 12/3021 21/10 2-3 library level 4/4077 7/10/10 Su White Learning Societies Lab. How to disciplinary differences affect research practices? Methodological approaches start a reading list in EndNote. Set up a glossary in Elgg Differences between:

Oct 2010 Jan 2011

METHOD How Tools

METHODOLOGY Approach The Way Principles

Micro, macro and middle range theories (see slide for this):

Page 32

Semester 1 notes

Oct 2010 Jan 2011

OOPSLA Abstract for publishing papers within this community. Biglan, 1973 differences in disciplines Hard, Soft, Applied, Pure. Classification of disciplines and their subject-matter characteristics.

Page 33

Semester 1 notes

Oct 2010 Jan 2011

Look at Biglans diagram in reference to the Web Science diagram:

Page 34

Semester 1 notes

Oct 2010 Jan 2011

Jack Were developing our own creole. Communication with older people and rural communities Su does this research. 14/10/10 Gerry Stoker Experimental and Comparative Methods in the Social Sciences Effect / Cause

Page 35

Semester 1 notes
Independent Variables affecting

Oct 2010 Jan 2011

WHY? CAUSE Dependent Variable Effect CAUSE

CAUSE

David Hume, C18th. Patterns. Leading to explaining A causes B. Cause precedes Effect.

METHOD 1 EXPERIMENT INTERVENTION Researcher creates data. Way to isolate the variable CONTROL Group to gather data is established. RANDOM ALLOCATION Of subjects of research. So look at features of two groups where the only difference is the Intervention.

LOGIC: Control factors Pre/post measurements Repeat and replicate for handle on causality Isolate variable.

Experiments: Lab and field approaches Can you see it in other instances? i.e. political sciences (Lijphart, 1971. Pp.684-5

Still a good way to think about what youre doing: Statistical method Comparative analysis Case study selection Rational thought experiments Page 36

Semester 1 notes
Doubts: In the lab ARTIFICIALITY of the lab setting:

Oct 2010 Jan 2011

Peters, CP. 1998: 48 in controlled environment, people will act differently than in their everyday lives. But can establish INTERNAL VERACITY. In the field Cant change important things. Cant change details in the real world. Peters, CP. 1998: 212. ETHICS: Criticism experiments can a) lead to harm, b) are misleading. Justification will experiment result in social benefits? You can debrief afterwards Criticism - If what you are going to be doing in the experiment has benefits, why do some people miss out (i.e. the ones not involved in the experiment)? Justification - Because we dont know. Which is why we are doing the experiment. RISE IN EXPERIMENTS: Psychology; Economics; Politics; Social Policy. US in particular Policy experiments. Use of the internet.

MISTAKES: Estimating effect size is hard Staff involved in doing it may do it wrong (accidently and deliberately) Attrition of participants from programme (i.e. if they are being treated for something and they get better) Estimating effects of system-wide reform with randomised trials problematic because of contamination effects. I.e. treatment group talking to control group.

VALIDITY PROBLEMS: INTERNAL Care, openness, challenge to work Will rest on you guessing mistakes beforehand and adjusting for them before you do the experiment. EXTERNAL Argue for accumulation of the evidence If you change variations will it still hold?

WHY EXPERIMENT?

Page 37

Semester 1 notes
Engagement and therefore relevance Flexibility and adaptability

Oct 2010 Jan 2011

THEORY EXPERIMENT

EVIDENCE

Creates dynamic relationship

SO: Be explicit about value added Quasi and natural possibilities Mix methods i.e. include observation

COMPARATIVE METHOD IS LINKED TO EXPERIMENTAL METHOD. METHOD 2 COMPARATIVE METHOD: John S. Mill method of difference: Case A y occurs Case B no y Case A x is present Case B no x Therefore x causes y BUT similarities between the cases is important. method of agreement: Cases A are causal factors x, y, w, z present? If not present, it drops out. Until one factor remains. BUT what if there are unknown factors? Problems sometimes: Too many variables, Not enough cases. But generally a good method.

N = number of cases. Page 38

Semester 1 notes
For comparison method, most people use Large N Therefore statistical logic explanation cases become variable. In 1960s, more and more N was added. COMPARING CASES: Recently; Prezeworkski and Limongi, 1997. Looked at cases over time. Boix and Stokes, 2003. In opposition to this. Natural Experiments: Small N Large N assumes that cases are not variables and not cases. Large N analytical; probabilistic predictions.

Oct 2010 Jan 2011

Qualitative Comparative Analysis: Charles Ragin Boolean approach to cases. develop from cases a TRUTH TABLE. Cases are combinations of variables. Case Factor A Present? Factor B Present? Factor C Present? Factor D Present? Outcome 1 0 1 1 1 1 2 1 0 1 0 0 3 0 0 0 1 0 So after you work it out, express it as a BOOLEAN EXPRESSION: Factor present = upper case; factor not present = lower case. 1 aBCD 2 aBcD 3 abcD Therefore the argument will be aBD as C and c are not relevant and they are not the same across the cases. A variation of this is Fuzzy Set Analysis. All overlap. A experiment Needs logic from A and C B statistics C comparative Methods of cause? Why?

18/10/10 Literature Surveys Mark Weal Coursework A Annotated Bibliography. Research Questions. Page 39

Semester 1 notes
Why write an annotated bibliography? Relate research to larger dialogue within the literature. Share related research with your reader. Framework for establishing importance of your survey. Benchmark for your work in that area.

Oct 2010 Jan 2011

What to use annotated bibliography for: Frame a problem present the literature in the introduction. Review the literature in separate section. Present literature at the end for comparison/contrast.

Research Question. 1. Identify key words 2. Search the catalogues subject area on Soton Library homepage; Google Scholar; TDNet/WebCat; Web of Science; target specific databases. 3. Locate a selection of articles (c.50+). Identify which are relevant to your work. 4. Skim articles and sift. Only look at titles and abstracts. 5. Apply WSSC. this is the classification for Web Science. Still new, may be lots of others. 6. Identify 10 papers. 7. Design a literature map this map can include more than just the final ten papers. 8. Draft summaries of relevant articles. 9. Assemble the literature review not for this assignment. Search Strategies: Include this information in your submission: What bibliographies did you use? Search terms used. Searches A, B, C, and combinations of the above. Inc. Boolean operators used.

Make sure you explain why you have picked stuff thats not a publication if you choose blog post write-ups or unpublished conference papers, etc. CONSIDER: Reliability o Who published it? o Who wrote it? Authority o New to author? o Authors qualifications o How was the work funded? o Peer-reviewed? Validity Page 40

Semester 1 notes
o Rational clear? o How data was collected/analysed? Rigorous? Relevance o Duplication of other work? o Relates to specific aspects of your work?

Oct 2010 Jan 2011

Try to pick papers from different disciplines. CRESWELL, 2009. RESEARCH DESIGN. this seems to be the book that this module is based on. For each reference: 1. 2. 3. 4. 5. Problem being addressed. Purpose/focus of study. Information on sample, population, subjects. Key results. Technical and methodological flaws.

Using WSSC, classify the papers. Michalis Vafopoulos wrote this. Assignment: Coursework is 30% of module. Brief summary of research question, including your focus. Outline research strategy. Literature map. Ten key publications including; full reference; paragraph summary; web science classification scheme keywords. Especially mention why each reference is so relevant to your research question. Submit PDF to C-BASS on 19th November. Summary 10% Search Strategy 10% Literature Map 30% Bibliography 50%.

My Question:

How aware are people when they participate in human computation?

Page 41

Semester 1 notes
21/10/10 LIBRARY RESOURCES Online bibliographies. Get to these from the Library homepage, and subject help, databases: Law WestLaw within this, good journal is European Intellectual Property Review. ECS IEEE Xplore; ACM, LNCS full text access for these. Web of Science. Sociology Sociological Abstracts; Social Sciences Citation Index. CSA Illumina. Dont forget Soton ePrints: eprints.soton.ac.uk and eprints.ecs.soton.ac.uk Web of Science: http://wok.mimas.ac.uk this is actually the ISI Web of Knowledge.

Oct 2010 Jan 2011

Page 42

Semester 1 notes

Oct 2010 Jan 2011

Other Lectures
20/10/10 Trust in Anonymity Networks, V. Sassone Protocols for anonymity Onion and Crowds. Reiter and Rubin, 1998. Crowds. for anonymous web transactions. Cannot be totally anonymous because of IP and MAC and ISP. But Crowd confuses attacker because of mass. Introduces reasonable doubt. Users randomly allocate other users to get pages from a selection of servers. Probability = Pf So therefore only 1 user knows they are the initiator; the predecessor and successor are not hidden. But we are trying to hide information from peers and servers, not M16. The ISP will always be trackable. Timing Analysis. probable innocence. On the sliding scale between: Absolute Privacy and Provably Exposed. More likely that you get to Beyond Suspicion, Possible Innocence, or Probable Innocence. Also, not exclusive. Crowds Protocol using this protocol, Peers come to the conclusion of Probable Innocence, Servers come to the conclusion of Beyond Suspicion. Syverson et al. ONION ROUTING. The initiator decides the whole path beforehand, not at all random. Data wrapped around in nodes ids before sending; so needs to be decrypted by each users. Users ids the layers are slowly revealed by users on the way to the server. So message is hidden as is the final destination. Remaining uncertainty = initial uncertainty minus information leaked. Halpern and ONeill, 2005 and Reiter and Rubin, 1998. Two differnet ways to find out the definition/metric. Now the agreed definition uses Information Theory: Palamidesi et al. similar to Information Flow Analysis. Uncertainty of motivator Shannon entropy. Malavaria et al. mutual information. Diaz et al. also. TRUST metrics based on notion of pseudo-identity. i.e. reputation models. Page 43

Semester 1 notes
Proposed approach for computation trust in networks. Things to consider: Concept of trust. Does this lead to autonomous decisions? Affecting security requirements: 1. Scalability 2. Mobility 3. Incomplete information.

Oct 2010 Jan 2011

Page 44

You might also like