Skip to content
Snippets Groups Projects
01_raw_intro.md 46.6 KiB
Newer Older
Alexander Dunkel's avatar
Alexander Dunkel committed
    and <code>%%time</code> is one of them. It will output the total execution time of a cell.
</details>


- We have only queried 50 of our 100 images for urls.
- To view only the subset of records with urls, use [boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)

```python
df_images[
    df_images["userid"] != 0].head()
```

- What happens here in the background is that `df_images["userid"] != 0` returns True for all records where "iserid" is not 0 (the default value).
- In the second step, this is used to `slice` records using the boolean indexing: `df_images[Condition=True]`


Next (optional) step: **Save queried data to CSV**
    
- dataframes can be easily saved (and loaded) to (from) CSV using [pd.DataFrame.to_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)
- there's also [pd.DataFrame.to_pickle()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html)
- a general recommendation is to use `to_csv()` for archive purposes ..
- ..and `to_pickle()` for intermediate, temporary files stored and loaded to/from disk

```python
df_images[df_images["userid"] != 0].to_csv(
    OUTPUT / "wikimedia_commons_sample.csv")
```

<div class="alert alert-warning" role="alert" style="color: black;">
    <details><summary><strong>Open CSV in the Explorer</strong></summary>
        <br>
        <div style="width:500px"><ul>
            <li>Click the link <a href="out/wikimedia_commons_sample.csv">wikimedia_commons_sample.csv</a>, to have a look at the structure of the generated CSV.</li>
            <li>Jupyter Lab provides several renderers for typical file formats such as CSV, JSON, or HTML</li>
            <li>Notice some of the Full User Names provides in the list. We will use this sample data in the second notebook, to explore privacy aspects of VGI.</li>
            </ul> </div>
    </details>
</div>


Create two point layers, one for images with url and one for those without:

```python
images_layer_thumbs = gv.Points(
    df_images[df_images["thumb_url"].notna()],
    kdims=['lon', 'lat'],
    vdims=['thumb_url', 'user', 'timestamp', 'title'],
    label='Picture (with thumbnail)') 
images_layer_nothumbs = gv.Points(
    df_images[df_images["thumb_url"].isna()],
    kdims=['lon', 'lat'],
    label='Picture') 
```

<div class="alert alert-info" role="alert" style="color: black;">
    <details><summary><strong>kdims and vdims?</strong></summary>
        <br>
        <div style="width:500px">
            <ul>
                <li>kdims refers to the <strong>key</strong>-dimensions, which provide the primary references for plotting to x/y axes (coordinates)</li>
                <li>vdims refers to the <strong>value</strong>-dimensions, which provide additional information that is shown
                in the plot (e.g. colors, size, tooltips)</li>
                <li>Each string in the list refers to a column in the dataframe.</li>
                <li>Anything that is not included here in the layer-creation cannot be shown during plotting.</li>
            </ul>
        </div>
    </details>
</div>

```python
margin = 500 # meters
bbox_bottomleft = (x - margin, y - margin)
bbox_topright = (x + margin, y + margin)
```

```python
from bokeh.models import HoverTool
from typing import Dict, Optional
def get_custom_tooltips(
        items: Dict[str, str], thumbs_col: Optional[str] = None) -> str:
    """Compile HoverTool tooltip formatting with items to show on hover
    including showing a thumbail image from a url"""
    tooltips = ""
    if items:
        tooltips = "".join(
            f'<div><span style="font-size: 12px;">'
            f'<span style="color: #82C3EA;">{item}:</span> '
            f'@{item}'
            f'</span></div>' for item in items)
    tooltips += f'''
        <div><img src="@{thumbs_col}" alt="" style="height:170px"></img></div>
        '''
    return tooltips
```

<div class="alert alert-info" role="alert" style="color: black;">
    <details><summary><strong>Bokeh custom styling</strong></summary>
        <br>
        <div style="width:500px">
        <ul>
            <li>The above code to customize Hover tooltips is shown for demonstration purposes only</li>
            <li>As it is obvious, such customization can become quite complex</li>
            <li>Below, it is also shown how to use the default Hover tooltips, which is the recommended
            way for most situations</li>
            <li>In this case, Holoviews will display tooltips for any DataFrame columns that are provided as <strong>vdims</strong>
             (e.g.: vdims=['thumb_url', 'user', 'timestamp', 'title'])</li>
                </ul>
        </div>
    </details>
</div>

```python
def set_active_tool(plot, element):
    """Enable wheel_zoom in bokeh plot by default"""
    plot.state.toolbar.active_scroll = plot.state.tools[0]

# prepare custom HoverTool
tooltips = get_custom_tooltips(
    thumbs_col='thumb_url', items=['title', 'user', 'timestamp'])
hover = HoverTool(tooltips=tooltips) 
    
gv_layers = hv.Overlay(
    gv.tile_sources.EsriImagery * \
    places_layer.opts(
        tools=['hover'],
        size=20,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='red') * \
    images_layer_nothumbs.opts(
        size=5,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='lightblue') * \
    images_layer_thumbs.opts(
        size=10,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='lightgreen',
        tools=[hover])
    )
```

```python
places_layer.opts(
        tools=['hover'],
        size=20,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='red')
```

Alexander Dunkel's avatar
Alexander Dunkel committed
<div class="alert alert-info" role="alert" style="color: black;">
    <details><summary><strong>Combining Layers</strong></summary>
        <div style="width:500px">
        <ul>
            <li>The syntax to combine layers is either <strong>*</strong> or <strong>+</strong></li>
            <li><strong>* (multiplay)</strong> will overlay layers</li>
            <li><strong>+ (plus)</strong> will place layers next to each other, in separate plots</li>
            <li>The <strong>\ (backslash)</strong> is python's convention for line continuation, <br>to break long lines</li>
            <li>The resulting layer-list is a <strong>hv.Overlay</strong>, which can be used for defining global plotting criteria</li>
                </ul>
        </div>
    </details>
</div>


**Store map as static HTML file**

```python
gv_layers.opts(
    projection=ccrs.GOOGLE_MERCATOR,
    title=df.loc[0, "name"],
    responsive=True,
    xlim=(bbox_bottomleft[0], bbox_topright[0]),
    ylim=(bbox_bottomleft[1], bbox_topright[1]),
    data_aspect=0.45, # maintain fixed aspect ratio during responsive resize
    hooks=[set_active_tool])
```

Alexander Dunkel's avatar
Alexander Dunkel committed
gv_layers.opts(
    projection=ccrs.GOOGLE_MERCATOR,
    title=df.loc[0, "name"],
    responsive=True,
    xlim=(bbox_bottomleft[0], bbox_topright[0]),
    ylim=(bbox_bottomleft[1], bbox_topright[1]),
    data_aspect=0.45, # maintain fixed aspect ratio during responsive resize
    hooks=[set_active_tool])
hv.save(
    gv_layers, OUTPUT / f'geoviews_map.html', backend='bokeh')
```

<div class="alert alert-warning" role="alert" style="color: black;">
    <details><summary><strong>Open map in new tab</strong></summary>
        <br>
        <div style="width:500px">
            In the file explorer on the left, go to notebooks/out/ and open geoviews_map.html with a right-click: Open in New Browser Tab.
        </div>
    </details>
</div>


**Display in-line view of the map:**

```python
gv_layers.opts(
    width=800,
    height=480,
    responsive=False,
    hooks=[set_active_tool],
    title=df.loc[0, "name"],
    projection=ccrs.GOOGLE_MERCATOR,
    data_aspect=1,
    xlim=(bbox_bottomleft[0], bbox_topright[0]),
    ylim=(bbox_bottomleft[1], bbox_topright[1])
    )
```

## Create Notebook HTML

- For archive purposes, we can convert the entire notebook, including interactive maps and graphics, to an HTML file.
- The command is invoked through the exclamation mark (**!**), which means: instead of python, use the command line.

Steps:  
- Create a single HTML file in ./out/ folder
Alexander Dunkel's avatar
Alexander Dunkel committed
- disable logging, except errors (`>&- 2>&-`)
Alexander Dunkel's avatar
Alexander Dunkel committed
- use nbconvert template

<div class="alert alert-info" role="alert" style="color: black;">
    Make sure to <strong>Save your Notebook</strong> before this step.
</div>

```python
!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./01_raw_intro.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&- # create single output file
Alexander Dunkel's avatar
Alexander Dunkel committed
```

## Summary


<div class="alert alert-warning" role="alert" style="color: black;">
    <details><summary><strong>There are many APIs</strong></summary>
        <div style="width:500px">
        <ul>
            <li>But it is quite complex to query data, since each API has its own syntax</li>
            <li>Each API also returns data structured differently</li>
            <li>Just because it is possible to access data doesn't mean that it is 
            allowed or ethically correct to use the data (e.g. Instagram)</li>
           </ul>
        </div>
    </details>
</div>


<div class="alert alert-info" role="alert" style="color: black;">
    <details><summary><strong>There are (some) Solutions</strong></summary>
        <div style="width:500px">
        <ul>
            <li>Typically, you would not write raw queries using request. For many APIs, packages exist that ease workflows.</li>
            <li>We have prepared <u><a href"https://lbsn.vgiscience.org/">LBSN Structure</a></u>, a common format 
            to handle cross-network Social Media data (e.g. Twitter, Flickr, Instagram). Using this structure reduces the work that is necessary,
            it also allows visualizations to be adpated to other data easily.</li>
            <li>Privacy is critical, and we will explore one way to reduce the amount of data stored in the follwing notebook.</li>
           </ul>
        </div>
    </details>
</div>



<div class="alert alert-success" role="alert" style="color: black;">
    <details><summary><strong>This is just an introduction</strong></summary>
        <div style="width:500px">
        <br>
    We have only covered a small number of steps. If you want to continue,
    we recommend trying the following tasks:
    <ul>
        <li>Start and run this notebook locally, on your computer, for example, using our <a href="https://gitlab.vgiscience.de/lbsn/tools/jupyterlab">IfK Jupyter Lab Docker Container</a>, or re-create the environment manually with <a href="https://docs.conda.io/en/latest/miniconda.html">Miniconda</a></li>
        <li>Explore further visualization techniques, for example:
            <ul><li>Create a static line or scatter plot for temporal contribution of the <code>df_images</code> dataframe, see <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html">Pandas Visualization</a></li>
                <li>Repeat the same with <a href="http://holoviews.org/user_guide/Plotting_with_Bokeh.html">Holoviews</a>, to create an interactive line/scatter plot.</li></ul>
    </ul>
        </div>
    </details>
</div>

<!-- #region -->
**Contributions:**

- **2021 Workshop**: Multigrid display was contributed by Silke Bruns (MA), many thanks!
Alexander Dunkel's avatar
Alexander Dunkel committed
```python
plt.subplots_adjust(bottom=0.3, right=0.8, top=0.5)
ax = plt.subplot(3, 5, ix + 1)
```

<!-- #endregion -->

```python tags=["hidden"]
root_packages = [
    'python', 'adjusttext', 'contextily', 'geoviews', 'holoviews', 'ipywidgets',
    'ipyleaflet', 'geopandas', 'cartopy', 'matplotlib', 'shapely',
    'bokeh', 'fiona', 'pyproj', 'ipython', 'numpy']
tools.package_report(root_packages)
```