2015년 4월 5일 일요일

Ecalyptus block storage integration with Ceph

This week, my purpose was to deploy Ceph to be used block storage of Eucalyptus. I created three machines, each machine had 4 X 10G - one for OS, three for OSD.



As see the above picture, three machines were added for Ceph storage and I also made Storage Controller node separated from Cluster Controller node.

When I was deploying these, I encountered two problems.
  • Ceph was in HEALTH_WARN - 192 pgs incomplete / 192 pgs stuck inactive / 192 pgs stuck unclean
  • Eucalyptus Storage was in NOTREADY

I created just one cluster, after that, I validated the ceph cluster. It showed HEALTHY_WARN.
ceph@ceph-node1:~/cluster01$ ceph osd tree
# id    weight  type name       up/down reweight
-1      0       root default
-2      0               host ceph-node1
0       0                       osd.0   up      1
1       0                       osd.1   up      1
2       0                       osd.2   up      1
-3      0               host ceph-node2
3       0                       osd.3   up      1
4       0                       osd.4   up      1
5       0                       osd.5   up      1
-4      0               host ceph-node3
6       0                       osd.6   up      1
7       0                       osd.7   up      1
8       0                       osd.8   up      1

ceph@ceph-node1:~/cluster01$ ceph status
    cluster 565bb65e-775d-449d-8d57-f36c7cf4a1d5
     health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean
     monmap e1: 1 mons at {ceph-node1=10.10.10.30:6789/0}, election epoch 2, quorum 0 ceph-node1
     osdmap e28: 9 osds: 9 up, 9 in
      pgmap v53: 192 pgs, 3 pools, 0 bytes data, 0 objects
            296 MB used, 45683 MB / 45980 MB avail
                 192 incomplete
Whenever I tried to run command, I couldn't get the result.
ceph@ceph-node1:~/cluster01$ rados lspools
data
metadata
rbd
ceph@ceph-node1:~/cluster01$ rados -p metadata ls
Because of in-completed pgs, following requested becomes slow requests and query commands hang.

Meanwhile, I got a hint from the blog "Ceph, Small Disks and Pgs Stuck Incomplete". It said that If the drive is small enough, OSD weights can result in 0.00. My all OSD's weights were zero. According to the site, weights can be non-zero if it has at least 10G. Although HDD had 10GB each (10GB = 0.01), half partitioned as ceph journal. Aa s result, my OSD for storing data only had 5G each and it was 0.00

So, I manually updated.
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.0 1
reweighted item id 0 name 'osd.0' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.1 1
reweighted item id 1 name 'osd.1' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.2 1
reweighted item id 2 name 'osd.2' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.3 1
reweighted item id 3 name 'osd.3' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.4 1
reweighted item id 4 name 'osd.4' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.5 1
reweighted item id 5 name 'osd.5' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.6 1
reweighted item id 6 name 'osd.6' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.7 1
reweighted item id 7 name 'osd.7' to 1 in crush map
ceph@ceph-node1:~/cluster01$ ceph osd crush reweight osd.8 1
reweighted item id 8 name 'osd.8' to 1 in crush map

ceph@ceph-node1:~/cluster01$ ceph osd tree
# id    weight  type name       up/down reweight
-1      9       root default
-2      3               host ceph-node1
0       1                       osd.0   up      1
1       1                       osd.1   up      1
2       1                       osd.2   up      1
-3      3               host ceph-node2
3       1                       osd.3   up      1
4       1                       osd.4   up      1
5       1                       osd.5   up      1
-4      3               host ceph-node3
6       1                       osd.6   up      1
7       1                       osd.7   up      1
8       1                       osd.8   up      1

# Status is in HEALTH_OK
ceph@ceph-node1:~/cluster01$ ceph status
    cluster 565bb65e-775d-449d-8d57-f36c7cf4a1d5
     health HEALTH_OK
     monmap e1: 1 mons at {ceph-node1=10.10.10.30:6789/0}, election epoch 2, quorum 0 ceph-node1
     osdmap e56: 9 osds: 9 up, 9 in
      pgmap v122: 192 pgs, 3 pools, 0 bytes data, 0 objects
            316 MB used, 45664 MB / 45980 MB avail
                 192 active+clean

# Create pools for volumes and snapshots
ceph@ceph-node1:~/cluster01$ ceph osd pool create euca-volumes 128 128
ceph@ceph-node1:~/cluster01$ ceph osd pool create euca-snapshots 128 128

ceph@ceph-node1:~/cluster01$ ceph osd pool set euca-volumes size 2
set pool 4 size to 2
ceph@ceph-node1:~/cluster01$ ceph osd pool set euca-snapshots size 2
set pool 5 size to 2 
Ceph's status changed to HEALTH_OK. There were no more hang for commands.

Next, I am going to explain how I solved NOTREADY state for Storage service. I had let this problem continued, so far, I focused on launching VMs, I didn't need to attach volumes or snapshots.

However, It's time to make it work. I always got the same result by running euca-describe-services.
[root@euca-clc ~]# euca-describe-services --all -E
...
SERVICE storage                 cluster01       sc-euca-clc             NOTREADY        25      http://10.10.10.170:8773/services/Storage       arn:euca:eucalyptus:cluster01:storage:sc-euca-clc/
ERROR   storage                 cluster01       sc-euca-clc             Failed to lookup host 10.10.10.170 for service arn:euca:eucalyptus:cluster01:storage:sc-euca-clc/.  Current hosts are: [Host 192.168.1.169 #25 /192.168.1.169 coordinator=192.168.1.169 booted db:synched(synced) dbpool:ok started=1428187521637 [/10.10.10.169, /192.168.1.169], Host 192.168.1.170 #25 /192.168.1.170 coordinator=192.168.1.169 booted nodb started=1428187908203 [/10.10.10.170, /192.168.1.170]]
SERVICEEVENT    1ea068a2-83ea-4007-a8aa-33bd1befa68d    arn:euca:eucalyptus:cluster01:storage:sc-euca-clc/
SERVICEEVENT    1ea068a2-83ea-4007-a8aa-33bd1befa68d    ERROR
SERVICEEVENT    1ea068a2-83ea-4007-a8aa-33bd1befa68d    Sun Apr 05 07:53:47 KST 2015

I added storage controller on the private network (10.10.10.0/24). I found that coordinator - I wasn't sure what it was - was running on the different network (192.168.1.0/24). I suddenly thought how it would be when I added storage controller on the same network.
[root@euca-clc ~]# euca_conf --register-sc --partition cluster01 --host 192.168.1.171 --component sc-euca-sc
SERVICE storage         cluster01       sc-euca-sc      BROKEN          29      http://192.168.1.171:8773/services/Storage       arn:euca:eucalyptus:cluster01:storage:sc-euca-sc/
After a while, I checked it again.
[root@euca-clc ~]# euca-describe-services --all
..
SERVICE storage                 cluster01       sc-euca-sc              ENABLED         62      http://192.168.1.171:8773/services/Storage      arn:euca:eucalyptus:cluster01:storage:sc-euca-sc/
SERVICE cluster                 cluster01       cc-euca-cc              ENABLED         62      http://10.10.10.170:8774/axis2/services/EucalyptusCC    arn:euca:eucalyptus:cluster01:cluster:cc-euca-cc/
SERVICE node                    cluster01       10.10.10.178            ENABLED         62      http://10.10.10.178:8775/axis2/services/EucalyptusNC    arn:euca:bootstrap:cluster01:node:10.10.10.178/
...
Finally, I got it working.

The next steps will be configuring properties to use Ceph and The following is a site for well-explaining next steps - https://johnpreston78.wordpress.com/2015/02/21/eucalyptus-and-ceph-for-elastic-block-storage

2015년 3월 29일 일요일

Troubleshoot - Eucalyptus Instances do not get private IP address

I prepare 3 VMs on my VMware Workstation for testing Eucalyptus. These are Cloud Controller(CLC), Cluster Controller(CC) and Node Cluster(NC).

I installed components like the following:

  1. S/W: CentOS 6.6, Eucalyptus 4.1.0, euca2ools 3.2.0
  2. Network Mode: Managed(NOVLAN)
  3. IPs: Public 192.168.1.0/24, Private 10.10.10.0/24, Virtual Network: 172.16.0.0/16
DHCP daemon (yellow) installed on CC node and VMs of NC node had to get IP address from this daemon if don't have the problem.

I created VM but failed to get IP address, specifically, the VM was able to get IP without turning off firewall service on NC node.

I printed console when didn't get IP
[root@euca-clc ~]# euca-get-console-output i-9521eb03
...
Cloud-init v. 0.7.4 running 'init-local' at Tue, 24 Mar 2015 22:00:23 +0000. Up 60.61 seconds.
Starting cloud-init: /usr/lib/python2.6/site-packages/cloudinit/url_helper.py:40: UserWarning: Module backports was already imported from /usr/lib64/python2.6/site-packages/backports/__init__.pyc, but /usr/lib/python2.6/site-packages is being added to sys.path
  import pkg_resources
Cloud-init v. 0.7.4 running 'init' at Tue, 24 Mar 2015 22:00:25 +0000. Up 61.88 seconds.
ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
ci-info: +--------+------+-----------+-----------+-------------------+
ci-info: | Device |  Up  |  Address  |    Mask   |     Hw-Address    |
ci-info: +--------+------+-----------+-----------+-------------------+
ci-info: |   lo   | True | 127.0.0.1 | 255.0.0.0 |         .         |
ci-info: |  eth0  | True |     .     |     .     | d0:0d:dc:bc:09:70 |
ci-info: +--------+------+-----------+-----------+-------------------+
ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The root cause was that I did wrong firewall settings on NC node. The important thing was to add FOWARD rules of virtual network (172.16.0.0/16).
 
# Add FORWARD rules
[root@euca-nc01 ~]# iptables --append FORWARD  --proto udp --sport 68 --dport 67 --jump ACCEPT 
[root@euca-nc01 ~]# iptables --append FORWARD  --source 172.16.0.0/16 --jump ACCEPT 
[root@euca-nc01 ~]# iptables --append FORWARD  --destination 172.16.0.0/16 --jump ACCEPT 

# Recreate VM  
[root@euca-clc ~]# euca-run-instances $image_id --instance-type m1.small --key euca-default --group default
[root@euca-clc ~]# euca-get-console-output $ins_id
...
Starting cloud-init: /usr/lib/python2.6/site-packages/cloudinit/url_helper.py:40: UserWarning: Module backports was already imported from /usr/lib64/python2.6/site-packages/backports/__init__.pyc, but /usr/lib/python2.6/site-packages is being added to sys.path
  import pkg_resources
Cloud-init v. 0.7.4 running 'init-local' at Sun, 29 Mar 2015 07:56:50 +0000. Up 38.13 seconds.
Starting cloud-init: /usr/lib/python2.6/site-packages/cloudinit/url_helper.py:40: UserWarning: Module backports was already imported from /usr/lib64/python2.6/site-packages/backports/__init__.pyc, but /usr/lib/python2.6/site-packages is being added to sys.path
  import pkg_resources
Cloud-init v. 0.7.4 running 'init' at Sun, 29 Mar 2015 07:56:52 +0000. Up 40.55 seconds.
ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +--------+------+--------------+-----------------+-------------------+
ci-info: | Device |  Up  |   Address    |       Mask      |     Hw-Address    |
ci-info: +--------+------+--------------+-----------------+-------------------+
ci-info: |   lo   | True |  127.0.0.1   |    255.0.0.0    |         .         |
ci-info: |  eth0  | True | 172.16.77.93 | 255.255.255.240 | d0:0d:f3:ab:33:87 |
ci-info: +--------+------+--------------+-----------------+-------------------+
ci-info: +++++++++++++++++++++++++++++++++Route info++++++++++++++++++++++++++++++++++
ci-info: +-------+--------------+--------------+-----------------+-----------+-------+
ci-info: | Route | Destination  |   Gateway    |     Genmask     | Interface | Flags |
ci-info: +-------+--------------+--------------+-----------------+-----------+-------+
ci-info: |   0   | 172.16.77.80 |   0.0.0.0    | 255.255.255.240 |    eth0   |   U   |
ci-info: |   1   |   0.0.0.0    | 172.16.77.81 |     0.0.0.0     |    eth0   |   UG  |
ci-info: +-------+--------------+--------------+-----------------+-----------+-------+

I made a inquiry of what should do to get IP properly and one of my colleague finally gave me link that was perfect answer. According to link, provided the guidance for configuring the following settings on NC node.
 
# Generated by iptables-save v1.4.7 on Wed Mar  6 21:19:36 2013
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [294733:108329028]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A INPUT -p icmp -j ACCEPT 
-A INPUT -i lo -j ACCEPT 
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT 
-A INPUT -p tcp -m state --state NEW -m tcp --dport 8775 -j ACCEPT 
-A INPUT -j REJECT --reject-with icmp-host-prohibited 
-A FORWARD -p udp -m udp --sport 68 --dport 67 -j ACCEPT 
-A FORWARD -s 192.168.0.0/16 -j ACCEPT 
-A FORWARD -d 192.168.0.0/16 -j ACCEPT 
-A FORWARD -j REJECT --reject-with icmp-host-prohibited 
COMMIT
# Completed on Wed Mar  6 21:19:36 2013

One more thing that I need to mention.
In case of not connecting metadata server when running VM, Please check if opens TCP port 8773 on CLC node.

Let's see the picture.
Metadata server is typically 169.254.169.254 and is added to eth1 on CC node. It looks like Medata server is running on CC node.

Look at the next.

Eucalyptus automatically adds a PREROUTING rule which send Meatadata server traffic to 8773 port of CLC node.
So, the node which serves as Metadata server is CLC node, not CC node. As as result, it needs to check CLC node in dealing metadata.


References:
1. https://www.eucalyptus.com/docs/eucalyptus/4.1.0/index.html#install-guide/configuring_iptables.html
2. https://eucalyptus.atlassian.net/browse/EUCA-5323

2014년 11월 26일 수요일

Cookbook application_java and workaround to the error - undefined method 'create' for nil:NilClass

These days, I developed cookbooks to deploy java web applications. To do this efficiently, need to combine some cookbooks: java, tomcat and application_java. As what they intend to do by their names, they certainly install and configure software related to java (JDK, java container server and java application).

Developing java and tomcat cookbook wasn't a big deal, but, I encountered problem with an error in case of application_java. My code was follows:
include_recipe 'java'

application 'shop-admin' do
    path '/var/shop-admin'
    repository 'http://192.168.56.170/zabbix/shop.admin.war'
    revision '1.0'
    scm_provider Chef::Provider::RemoteFile::Deploy

    java_webapp
    tomcat
end
My error message was:
...
================================================================================
Error executing action `deploy` on resource 'deploy_revision[shop-admin]'
================================================================================

NoMethodError
-------------
undefined method `create' for nil:NilClass

Cookbook Trace:
---------------
/var/chef/cache/cookbooks/application_java/libraries/provider_remote_file_deploy.rb:53:in `action_sync'
...
The solution was to change application_java cookbook. By default, Chef Supermarket links to https://github.com/poise/application_java.

Other people who had same issue already posted and someone recommended to clone from https://github.com/jamiely/application_java
I replaced with "jamiely/application_java". After that, my war deployed well.

2014년 11월 22일 토요일

Cost comparision VMware and Redhat Cloud

Comparison between VMware and Redhat - Redhat Cloud Infrastructure (RHCI includes Openstack and redhat currently mainatains RDO). VMware is often and still being compared with others. But cost saving is a silver bullet to attract IT manager. This image came from Redhat's webinar - http://www.redhat.com/en/about/events/building-and-managing-hybrid-cloud-red-hat-cloud-infrastructure.

2014년 5월 10일 토요일

How to change network model for Windows instance in Openstack.

I posted about installing virtio drivers on Windows 2012R2. This was for as Openstack glance.

I chose this image to create a Windows virtual instance using nova command and I found the instance had no network connection at all after boot.


My last post described how to install virtio scsi controller (HDD) and baloon (memory) driver. Network was not included.

In Openstack, the compute node which has KVM hypervisor tries to add virtual network typed "virtio" for instances. My instance had also virtio one. 
$ virsh dumpxml 
...
    <interface type='bridge'>
      <mac address='fa:16:3e:82:9d:c3'/>
      <source bridge='br-int'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='a3bc1442-e121-4730-9f6f-1abff6466f11'/>
      </virtualport>
      <target dev='tapa3bc1442-e1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
Because I didn't installed virtio network driver for the glance image.
So, What I did to enable network of the instance was to change network type from "virtio" to "e1000" (default ethernet driver in Linux) of the glance image.

$ glance image-update --property hw_vif_model=e1000 $image_id2
+-------------------------+--------------------------------------+
| Property                | Value                                |
+-------------------------+--------------------------------------+
| Property 'hw_vif_model' | e1000                                |
| checksum                | 572850147e8f2cf1814e4953065a6421     |
| container_format        | bare                                 |
| created_at              | 2014-05-09T04:16:56                  |
| deleted                 | False                                |
| deleted_at              | None                                 |
| disk_format             | qcow2                                |
| id                      | 0ec11912-6634-4e09-bf09-c97373da2a47 |
| is_public               | True                                 |
| min_disk                | 0                                    |
| min_ram                 | 0                                    |
| name                    | windows2012r2                        |
| owner                   | None                                 |
| protected               | False                                |
| size                    | 10739318784                          |
| status                  | active                               |
| updated_at              | 2014-05-09T07:48:21                  |
+-------------------------+--------------------------------------+

After updating, I created the 2nd instance. This time, network in my 2nd instance is working well.


Let's look at how the network type is added for the 2nd instance.
$ virsh dumpxml 
...
    <interface type='bridge'>
      <mac address='fa:16:3e:57:cf:41'/>
      <source bridge='br-int'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='1a3a7af8-945e-425f-a39f-e6a48b7d87f3'/>
      </virtualport>
      <target dev='tap1a3a7af8-94'/>
      <model type='e1000'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

As to install virtio network driver, I added it to my evernote. Please refer to this link for further information. Of course, you don't need to change network type in case you've already installed virtio network driver.

2014년 5월 5일 월요일

Installing Windows Server 2012 R2 with VirtIO on KVM/QEMU.


I'd like to share my evenote about installing Windows Server 2012 R2 with virtio on KVM/QEMU.

VirtIO drivers which enable para-virtual in KVM generally provide better performance than emulated devices under full-virtualization.



Tests in this note ran on a Ubuntu 12.04 machine.  
Here's my note : https://www.evernote.com/shard/s63/sh/7ca831a4-b275-4c6f-8886-4ba9103c0af3/5a912740e9ab14f342e3c194974390eb



2013년 8월 12일 월요일

How to solve message "Starting nagios:No directory, logging in with HOME=/" when starting nagios deamon

I wrote a script of installing nagios 3 named "install-nagios3.sh" on my github repository (https://github.com/yeonki-choi/nagios) and tested it. After ran this, everything was fine excepts for message "Starting nagios:No directory, logging in with HOME=/" when starting nagios daemon.
$ sudo service nagios start
Starting nagios:No directory, logging in with HOME=/
done.

I found this message was same-produced when I switched to user "nagios". It was caused there was no home directory for user "nagios". This script created the user "nagios" with no home directory during running to own nagios-core's home directory and start its daemon.

$ sudo su - nagios
No directory, logging in with HOME=/

To do solve this, it just make home directory for that user

# Make a directory and changed the ownership
$ sudo mkdir /home/nagios
$ sudo chown -R nagios:nagios /home/nagios

# Setting the directory as the home of nagios
$ sudo usermod --home /home/nagios nagios
usermod: no changes

# Restarting nagios daemon, this time there is no above message 
$ sudo service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.