Professional Documents
Culture Documents
Introduction
This document will provide a step-by-step procedure to update the base MapR packages from 3.0.2 to
4.0.2 and ecosystem components. This document is intended for validation on the AXP sandbox cluster
and to be modified for use in additional environments. With MapR v4.X, the cluster has the ability of
running hadoop1 (classic) and hadoop2 (yarn) workloads simultaneously. The goal is to provide a cluster
with 80% capacity initially dedicated for hadoop1 (classic) and 20% for hadoop2 (yarn). Over time, the
capacity can be incrementally shifted to provide a higher percentage for hadoop2 (yarn). American
Express currently has 3 pilot clusters running MapR v4.0.2. Many of the configuration details have been
developed as part of this pilot program.
Scope
The scope of this document will cover the MapR base packages and ecosystem packages. We also cover
the updates needed for configuration files and configuration for hadoop1 / hadoop2 on the edge nodes.
MapR 2.x
MapR 3.0.x
MapR 3.1.x
MapR 4.0.x
JDK 6
Yes
Yes
Yes
No
JDK 7
No
Yes
Yes
Yes
JDK 8
No
No
No
Yes
Apache Spark
MapR 3.0.x
MapR 4.0.2
(MapReduce v1 mode)
MapR 4.0.2
(YARN Mode)
No
No
No
10
Yes
No
No
11
Yes
No
No
12
Yes
Yes
Yes
13
Yes
Yes
Yes
0.9.1
Yes
No
No
0.9.2
Yes
No
No
1.0.2
Yes
Yes
Yes
No
Yes
Yes
1.1.1
Yes
No
No
1.2.3
Yes
Yes
No
1.4.1
No
Yes
No
10
Yes
Yes
No
11
Yes
No
No
12
Yes
Yes
Yes
13
No
Yes
Yes
1.3.1
Yes
No
No
1.4.0
Yes
No
N/A
1.5
No
Yes
N/A
1.4.3
Yes
No
No
1.4.4
Yes
Yes
Yes
1.4.5
Yes
Yes
Yes
Apache Sqoop2
1.99.0
Yes
Yes
Yes
Apache Mahout
0.7
Yes
No
No
0.8
Yes
No
No
0.9
No
Yes
Yes
3.3.2
Yes
No
No
4.0.0
Yes
No
No
4.0.1
No
Yes
Yes
Yes
No
No
3.5
Yes
No
No
3.6
No
Yes
Yes
0.92.2
No
No
No
0.94.17
Yes
No
No
0.94.21
Yes
No
No
0.98.4
No
Yes
Yes
0.98.7
No
Yes
Yes
Impala
Apache Pig
Apache Flume
Apache Sqoop
Apache Oozie
Hue
Apache HBase
0.5
No
Yes
N/A
0.6
No
Yes
N/A
0.6R2
No
Yes
N/A
0.7
No
Yes
N/A
1.4.1
Yes
No
No
1.5
No
Yes
Yes
2.1.6
Yes
No
No
2.5
No
Yes
Yes
0.8.1
No
No
No
Yes
Yes
N/A
0.4
N/A
N/A
Yes
MapReduce
1.0.3
Yes
Yes
N/A
MapReduce
2.5.1
N/A
N/A
Yes
Storm
0.9.3
N/A
Yes
N/A
Sentry
1.4.0
No
Yes
Yes
Asynchbase
Cascading
Whirr
HTTPFS
Apache Tez
(Developer Preview)
Apache Hive
12
0.12.23716
0.12.201502021326
Apache Pig
12
0.12.23716
0.12.27259
Apache Flume
1.4.0 / 1.5.0
1.4.0.23547
1.5.0.201501191849
Apache Sqoop
1.4.4
1.4.4.22554
1.4.4.201411051136
0.7 / 0.9
0.7.22084
0.9.201409041745
Apache Oozie
3.3.2 / 4.0.1
3.3.2.23554
4.0.1.201501231601
Apache HBase
0.94.13 / 0.98.7
0.94.13.23554
0.98.7.201501291259
2.1 / 2.5
2.1.20130606
2.5
0.8.1
0.8.1.18380
NA
Apache Mahout
Apache Cascading
Apache Whir
3.3.6 / 3.4.5
3.3.6
3.4.5
0.9.3
Apache Storm
1.6.0_33
1.7.0_67 or later
MySQL Server
5.6.1
5.6.1 (unchanged)
LWS-Solr
2.6.3
2.6.3 (unchanged)
Elastic Search
1.2.1
1.2.1 (unchanged)
8.01.00-rel141029
1.4.4-3.el6
Talend
5.4.1
5.6.1
Platfora
4.0.3
4.1.X
Datameer
4.5.6
Under certification
Revolution R
7.3.0
Under certification
4.4.2.10 / 4.4.2.11
2.6.6
2.6.6 (unchanged)
6.3
Up to 6.6
Oracle Java
Kognitio
Memcached
Dataguise
Python
Redhat Linux
Upgrade Outline
The high level plan describes the order in which clusters and cluster infrastructure should be upgraded. Specific details
and configuration found later in the document.
1 Upgrade to Redhat Linux 6.6
Linux should be upgraded prior to the upgrade of any MapR components. Any issues related to the
Linux upgrade should be resolved prior to the MapR upgrade. The upgrade consists of updating the
Linux packages in StackIQ and running yum upgrade. This can be performed in a rolling fashion on a
subset of nodes and requires a reboot once complete. The node can rejoin the cluster and run
workloads as usual.
Note: Any vendor specific drivers (Cisco UCS, IBM 3650) should also be updated as part of the process.
5. Upgrade to Java 1.7.x
Once all cluster nodes are running Redhat 6.5, Oracle Java 1.7.0_67 (or later) can be installed. The
package is released in RPM so it can be installed via StackIQ. Once installed, the JAVA_HOME variable
should be set via /opt/mapr/conf/env.sh or /etc/alternatives. The upgrade does not require a reboot,
To configure the CPU / Memory / Disk for 80% MR1, set the following in warden.conf
Config file
/opt/mapr/conf/warden.conf:
mr1.memory.percent=80
mr1.cpu.percent=80
mr1.disk.percent=80
Note: The memory will be allocated as a remainder after all other heap space is used (fileserver,
nodemanager, tt, hbase-regionserver. The percentage is based on the remaining memory on the
server. For M7, 4x CPUs are reserved for fileserver, on M5 2x CPUs are reserved for fileserver.
Jobs can be submitted to either MR1 or MR2. When the hadoop_version environment variable is set
to classic, the command hadoop will point to /opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop. We
also have hadoop1 and hadoop2 commands available which point to MR1 and MR2 respectively.
13.Fair Scheduler Configuration (YARN)
The fair scheduler for YARN is configured differently than for the JobTracker. The JobTracker fair
scheduler configuration should remain the same. To enable the fair scheduler for YARN
Config file
/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/yarn-site.xml:
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSched
uler</value>
</property>
<property>
<name>yarn.scheduler.allocation.file</name>
<value>fairscheduler.xml</value>
</property>
<property>
<name>yarn.scheduler.fair.preemption</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.fair.allowundeclaredpools</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.userasdefaultqueue</name>
<value>false</value>
</property>
Notes:
The queues are configured in /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/fair-scheduler.xml
Preemption is enabled
We do not allow undeclared pools (all queues must be defined)
We do not allow user as default queue (users must specify a queue and the users name will not be
used)
Notes:
The root queue is necessary
Users in the group mygroup can submit to myqueue
The default queue will only allow root to submit, but additional configuration is required to allow root
to submit mapreduce jobs
14.Mapreduce v1 API Changes for 4.0.X
Existing compiled MapReduce V1 applications may need to be recompiled before they can be run as
MapReduce V1 applications in MapR Version 4.0.x. The small number of API changes that have been
made, including removal of classes and methods and conversion of classes to interfaces, are
documented here. If your application does not use any of the changes listed in this document, you do
not need to recompile the application.
When an application has been compiled against MapReduce V1 or MapReduce V2 (YARN) in MapR 4.0.x,
the application can be run in either mode.
The following list of changes is grouped by package name:
org.apache.hadoop.mapred.jobcontrol
Job
extends ControlledJob
getMapredJobID: return type changed from String to JobID
JobControl
extends org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl
String addJob(ControlledJob) instead of String addJob(Job), where Job extends ControlledJob
org.apache.hadoop.mapred
JobContext
Extends AbstractCounters, and some methods have been type parameterized, using generic. This
change breaks binary compatibility.
TaskLog
10
Extends AbstractCounters, and some methods have been type parameterized, using generic. This
breaks binary compatibility.
Job
11
ReduceContext.ValueIterator
Change in signature for some methods (emoved arg - signal; now just takes
pid): killProcess, killProcessGroup, etc.
15.Submitting Jobs to MR1 or MR2
If no recompiling of application code is necessary, the job can be submitted to either MR1 or MR2
Note Dmapred.job.queue.name=queue is deprecated in MR2 (YARN), but will function
MR1 job:
$>hadoop1jar<path_to_jar_file>Dmapred.job.queue.name=<queue>
MR2 job:
$>hadoop2jar<path_to_jar_file>Dmapreduce.job.queuename=<queue>
Notes: Even though MR2 (YARN) fair scheduler has hierarchical queues, the single queue name can be
used (ie myqueue instead of root.myqueue)
16.Edge Node Configuration
The edge nodes can be configured for either MR1 (classic) or MR2 (YARN) as the default. The edge
node services (hiveserver2, oozie, etc) will need to start with one or the other. Choices need to be
made on how best to run services. For example, one edge node can be configured for MR1 and
another for MR2. A more complicated approach may be to run multiple instances of the same service
on different ports.
For example Hiveserver2
Run MR1 on the standard port 10000, run MR2 on a separate port 10001. This should be tested for
each service (hiveserver2, oozie, etc) prior to rolling out.
12
Additional ports will need to be open in order to interoperate with the ResourceManager,
NodeManagers, the Application Master, and the History Server. The ResourceManager port will only be
active on a single server at a time (8088). The Application Master is assigned per application and will
run on any NodeManager on the cluster.
Here is the complete port listing for MapR v4.0.2 with default port numbers:
Service
Port
CLDB
7222
7220
7221
DNS
53
HBase Master
60000
60010
HBase RegionServer
60020
9090
HistoryServer RPC
10020
19888
Hive Metastore
9083
Hiveserver2
10000
Httpfs
14000
Hue Beeswax
8002
Hue Webserver
8888
25020
Impala Daemon
21000
13
Port
Impala Daemon
21050
Impala Daemon
25000
Impala StateStoreDaemon
25010
JobTracker
9001
JobTracker web
50030
LDAP
389
LDAPS
636
1111
MFS server
5660
MySQL
3306
NFS
2049
9997
NFS management
9998
NodeManager
8041
8040
8042
NTP
123
Oozie
11000
Port mapper
111
8033
8032
8031
8030
8088
19890
8044
8090
14
Port
Shuffle HTTP
13562
SMTP
25
Sqoop2 Server
12000
SSH
22
TaskTracker web
50060
Web UI HTTPS
8443
8080
ZooKeeper
5181
2888
3888
Description
15
Description
&
AND operation.
OR operation
()
""
The empty string indicates that no user has the specified permission.
An example definition is u:1001 | r:engineering, which restricts access to the user with ID 1001 or to
any user with the role engineering.
In this next example, members of the group admin are given access, and so are members of the
group qa:
g:admin | g:qa
For another example, suppose that you have this list of groups to which you want to give read
permissions on a table:
The admin group as a whole, but not the admins for a particular cluster (which is named cl3).
Members of the qa group who are responsible for testing the two applications
(named app2 and app3) that access this table.
The business analysts (group ba) in department 7A (group dept_7a)
All of the data scientists (group ds) in the company.
To grant the read permission, you construct this boolean expression:
u:cfkane | (g:admin & g:!cl3) | (g:qa & (g:app2 | g:app3)) | (g:ba & g:dept_7a) | g:ds
This expression is made up of five subexpressions which are separated by OR operators.
The first subexpression u:cfkane grants the read permission to the username cfkane.
The subexpression (g:admin & g:!cl3) grants the read permission to the admins for all
clusters except cluster cl3. The operator g is the group operator, the value admin is the
name of the group of all admins. The & operator limits the number of administrators who
have read permission because only those administrators who meet the additional condition
will have it.
The condition g:!cl3 is a limiting condition. The operator ! is the NOT operator. Combined with the
group operator, this operator means that this group is excluded and does not receive the read
permission.
Icon
Be careful when using the NOT operator. You might exclude fewer people than you intended. For
example, suppose that you do not want anyone in the group group_a to have access. You therefore
define this ACE:
g:!group_a
You might think that the data in your table is now protected because members of group_a do not
have access to it. However, you have not restricted access for anyone else except the members
of group_a. The rest of the world can access the table.
You should not define ACEs through exclusion by using the NOT operator. You should define them by
inclusion and use the NOT operator to limit further the access of the groups or roles that you have
included.
16
The subexpression (g:qa & (g:app2 | g:app3)) demonstrates that you can use a
subexpression within a subexpression. The larger subexpression means that only members
of group qa who are also members of group app2 or app3 have read access to the table. The
smaller subexpression limits the number of people in the qa group have have this
permission.
The next two subexpressions -- (g:ba & g:dept_7a) and g:ds -- grant the read permission to
the members of group ba who are also in the group dept_7a. It also grants permission to the
members of the group ds.
2. Click the arrow at the right side of any field to display the Expression Builder for that field.
17
You can also type expressions directly into the field. The MCS validates expressions when focus
leaves the field. The field is colored yellow for a warning and red for an error. Hover the cursor on
the field to display the error or warning message.
Defining ACEs by using maprcli commands
You can set ACEs with the following commands:
table create Creates a new MapR table.
table edit Edits a MapR table.
table cf create Creates a column family for a MapR table.
table cf edit Edits a column-family definition.
table cf colperm set Set Access Control Expressions (ACEs) for a specified column.
18
When you create a new MapR table from the MapR Control System (MCS), check the Bulk Load box
under Table Properties.
19
CopyTableTest
com.mapr.fs.hbase.tools.CopyTableTest
CopyTable
com.mapr.fs.hbase.tools.mapreduce.CopyTable
ImportFiles
com.mapr.fs.hbase.tools.mapreduce.ImportFiles
If you are running on an HBase 0.98 client but the exported files were generated with HBase 0.94,
include -Dhbase.import.version=0.94 in the ImportFiles job.
20
Available for
MapR
Tables?
Yes
void close()
Yes
void createTable()(HTableDescriptor
desc, byte[][] splitKeys)
Yes
Yes
Yes
Yes
HTableDescriptor[] deleteTables(Patter
n pattern)
Yes
Configuration getConfiguration()
Yes
HTableDescriptor getTableDescriptor (b
yte[] tableName)
Yes
HTableDescriptor[] getTableDescriptors
(List<String> tableNames)
Yes
boolean isTableAvailable(String
tableName)
Yes
boolean isTableDisabled(String
tableName)
Yes
boolean isTableEnabled(String
tableName)
Yes
HTableDescriptor[] listTables()
Yes
Yes
No
Yes
Pair<Integer,
Integer> getAlterStatus (byte[]
tableName)
Yes
CompactionState getCompactionState(
String tableNameOrRegionName)
Yes
Returns CompactionState.NO
NE.
void split(byte[]
tableNameOrRegionName)
Yes
The tableNameOrRegionNam
e parameter has a different
Comments
21
Available for
MapR
Tables?
Comments
No
No
boolean balancer()
No
boolean balanceSwitch(boolean b)
No
No
No
boolean closeRegionWithEncodedRegio
nName(String encodedRegionName,
String serverName)
No
void flush(String
tableNameOrRegionName)
No
ClusterStatus getClusterStatus()
No
HConnection getConnection()
No
HMasterInterface getMaster()
No
String[] getMasterCoprocessors()
No
boolean isAborted()
No
boolean isMasterRunning()
No
void majorCompact(String
tableNameOrRegionName)
No
No
byte[][] rollHLogWriter(String
serverName)
No
boolean setBalancerRunning(boolean
on, boolean synchronous)
No
void shutdown()
No
22
Available for
MapR
Tables?
Comments
void stopMaster()
No
void stopRegionServer(String
hostnamePort)
No
No
HTable API
Available for
MapR Tables?
Comments
void clearRegionCache()
No
void close()
Yes
No
Returns null.
No
Returns null.
Map<HRegionInfo, HServerAddress>
deserializeRegionInfo(DataInput in)
Yes
void flushCommits()
Yes
Configuration getConfiguration()
Yes
HConnection getConnection()
No
Returns null
int getOperationTimeout()
No
Returns null
ExecutorService [getPool()
No
Returns null
int getScannerCaching()
No
Returns 0
ArrayList<Put> getWriteBuffer()
No
Returns null
long getWriteBufferSize()
No
Returns 0
boolean isAutoFlush()
Yes
void
prewarmRegionCache(Map<HRegionInf
o, HServerAddress> regionMap)
No
void serializeRegionInfo(DataOutput
out)
Yes
23
Available for
MapR
Tables?
Comments
Same as
setAutoFlush(b
oolean
autoFlush)
Yes
Yes
boolean shouldFlushOnRead()
Yes
void setOperationTimeout(int
operationTimeout)
No
void setScannerCaching(int
scannerCaching)
No
void setWriteBufferSize(long
writeBufferSize)
No
Atomic operations
Result append(Append append)
Yes
Yes
Yes
Yes
long incrementColumnValue(byte[]
row, byte[] family, byte[] qualifier, long
amount, boolean writeToWAL)
Yes
long incrementColumnValue(byte[]
row, byte[] family, byte[] qualifier, long
amount)
Yes
Yes
DML operations
void batch(List actions, Object[]
results)
Yes
Yes
Yes
Yes
24
Available for
MapR
Tables?
Yes
Yes
Yes
No
ResultScanner getScanner(...)
Yes
Yes
Yes
Comments
Yes
Map<HRegionInfo, HServerAddress>
getRegionsInfo()
Yes
List<HRegionLocation> getRegionsInR
ange(byte[] startKey, byte[] endKey)
Yes
byte[][] getEndKeys()
Yes
byte[][] getStartKeys()
Yes
Pair<byte[][], byte[][]>
getStartEndKeys()
Yes
HTableDescriptor getTableDescriptor()
Yes
byte[] getTableName()
Yes
Row Locks
RowLock lockRow(byte[] row)
No
No
HTablePool API
Available for
MapR Tables?
close()
Yes
closeTablePool(byte[] tableName)
Yes
closeTablePool(String tableName)
Yes
protected
HTableInterface createHTable(String
Yes
Comments
25
Available for
MapR
Tables?
Comments
tableName)
int getCurrentPoolSize(String
tableName)
Yes
HTableInterface getTable(byte[]
tableName)
Yes
HTableInterface getTable(String
tableName)
Yes
Yes
Description
ColumnCountGetFilter
ColumnPaginationFilter
ColumnPrefixFilter
ColumnRangeFilter
CompareFilter
FilterList
FirstKeyOnlyFilter
FirstKeyValueMatchingQualifiersFilter
FuzzyRowFilter
InclusiveStopFilter
KeyOnlyFilter
MultipleColumnPrefixFilter
PageFilter
PrefixFilter
RandomRowFilter
RegexStringComparator
SingleColumnValueFilter
26
Description
SkipFilter
TimestampsFilter
WhileMatchFilter
Available
for MapR
Tables?
alter
Yes
alter_async
Yes
create
Yes
describe
Yes
disable
Yes
drop
Yes
enable
Yes
exists
Yes
is_disabled
Yes
is_enabled
Yes
list
Yes
disable_all
Yes
drop_all
No
enable_all
Yes
show_filters
Yes
count
Yes
get
Yes
put
Yes
scan
Yes
Comments
27
Available
for MapR
Tables?
delete
Yes
deleteall
Yes
incr
Yes
truncate
Yes
get_counter
Yes
assign
No
balance_swi
tch
No
balancer
No
close_region
No
major_comp
act
No
move
No
unassign
No
zk_dump
No
status
No
version
Yes
whoami
Yes
Comments
19.Apache Hive
We plan to keep the current major version of Hive in place for the upgrade. Since hiveserver2 is
already configured with impersonation, no specific configuration changes should be necessary. The
hive metastore should be backed up prior to the upgrade. The newer version of hive works with both
MR1 and MR2 and hive will follow the environment variable in /opt/mapr/conf/hadoop_version. (by
default, hiveserver2 will start MR1 or MR2 based on hadoop_version).
For the upgrade, the existing version of hive-0.12 should be uninstalled and the new version installed.
This will ensure the proper hadoop jar files are in place.
Changes to hive-site.xml should include the updated jar files. In addition, the HBase jar files are now
split into 3 files, which will need to be updated:
hbase-client-0.98.7-mapr-1501.jar
hbase-common-0.98.7-mapr-1501.jar
hbase-protocol-0.98.7-mapr-1501.jar
The following Hive patches are included in the latest version:
28
Date (YYYY-MMDD)
Comment
4441453
2015-1-19
a3e0516
2015-1-14
58b1655
2014-12-30
c414fc8
2014-11-21
491ec41
2014-11-20
415b1b5
10-15-2014
c5854a7
11-03-2014
5a2322a
11-03-2014
c19ade2
2014-09-04
3cef19f
2014-08-06
f301d59
2014-09-02
20.Apache Pig
We plan to keep the current major version of Pig in place for the upgrade. As is the same in Hive, Pig
will follow the default MR1 / MR2 configuration set in /opt/mapr/conf/hadoop_version. This can be
overridden via script or environment variable. No further pig configuration changes will be necessary.
The following Pig patches have been included in the latest version:
Commit
Date (YYYYMM-DD)
Comment
39efebb
2014-08-21
d017b26
2014-08-19
c6fbb42
2014-06-27
29
Date (YYYYMM-DD)
Comment
followed by the RANK operator.
21.Apache Flume
We will be required to upgrade apache flume from 1.4.0 to 1.5.0 as part of the MapR v4.0.2 upgrade.
Flume 1.5.0 should be completely backwards compatible to 1.4.0 and should not require additional
configuration.
Below is a list of new features available for Flume 1.5.0:
3. During the script execution, you should see the following log messages:
POST_YARN=1,HBASE_VERSION=0.98.7:installingflume*hbase.98h2jars
Date (YYYYMM-DD)
Comment
7096e
0f
2014-12-11
Added tmp extensions to prevent the JVM from using jar files
from other versions of HBase.
e544e
94
2014-12-04
334a2
bc
2014-12-04
22.Apache Sqoop
Apache Sqoop will remain at the 1.4.4 base version. No configuration file updates should be necessary.
The existing version of Sqoop will need to be uninstalled and the new version installed.
The latest Sqoop 1.4.4 package contains the following patches:
Commi
t
Date
(YYYY-
Comment
30
2014-1014
303ad6
7
2014-0827
545068
f
2014-0827
7e4d22
5
2014-0614
f54e03
7
20-Nov2013
Fix for MapR 12083. Sqoop can find the Kerberos tgt file.
682565
5
02-Dec2013
23.Apache Mahout
Apache Mahout requires an upgrade from 0.7 to 0.9. No real configuration changes will be necessary
aside from updating the paths, but as we are jumping 2 full versions, some existing jobs may be
deprecated.
Below is a list of changes from 0.8 to 0.9:
31
You also need to copy the yarn-site.xml file for the active ResourceManager to the following location:
/opt/mapr/oozie/oozie<version>/conf/hadoopconf
Note: With MapR v4.0.2, the share libs are copied automatically when you first start Oozie 4.0.1.
The latest Oozie package contains the following patches:
Comm
it
Date (YYYYMM-DD)
Comment
2973bc
3
2014-11-20
911f78
a
2014-11-04
e6230c
2014-10-30
32
25.Apache HBase
Apache HBase will require an upgrade from 0.94.12 to 0.98.7 for MapR v4.0.2. This will require and
upgrade to the hfile format on disk. It is highly recommended to back up important tables to another
cluster prior to the upgrade.
Here is a summary of the steps needed to upgrade HBase:
1. Install the HBase 0.98.7 binaries.
2. Use the 0.98.7 binary to check for incompatible files on the running 0.94.x cluster. (The purpose of
this step is to convert the incompatible HFile format.)
$>/opt/mapr/hbase/hbase0.98.7/bin/hbaseupgradecheck
If incompatible files are found, you must purge them by running a compaction.
3. Shut down the HBase 0.94.13 services on the MapR cluster.
4. Execute the upgrade on the cluster:
$ /opt/mapr/hbase/hbase-0.98.7/bin/hbase upgrade -execute
5. Start the upgraded HBase services.
The apache upgrade documentation can be found here:
http://hbase.apache.org/book.html#upgrade0.96
Note: Upgrading from 0.94.X to 0.98.X is the same as upgrading from 0.94.X to 0.96.X.
Here is a list of the features found in the HBase 0.98.7:
This is the initial release of HBase 0.98.7 for the MapR Distribution for Hadoop. It includes the following
new features:
1. C APIs for HBase (libhbase). This library is not supported by MapR-DB.
2. HTable.checkAndMutate(). This API is not supported by MapR-DB.
3. Impersonation for HBase REST gateway with MapR-DB tables is supported on a MapR 4.0.2 cluster.
26.Apache Cascading
Apache Cascading will require an upgrade from 2.1 to 2.5 for MapR v4.0.2. We support MR1 and MR2
with Cascading 2.5. Users are advised to test existing code with the newer version of cascading.
Placeholder: additional information needed
27.Apache Whir
Apache Whir is merely a set of libraries for running cloud services. There are currently no Amex
projects using this software. It is recommended to remove whir prior to the upgrade.
28.Apache Zookeeper
With MapR v3.1.1 and later, MapR uses Zookeeper 3.4.5 instead of 3.3.6. The 3.4.5 version requires no
additional configuration and should be compatible with additional open source projects such as Solr
and Storm. Prior to the upgrade, it is recommended to back up /opt/mapr/zkdata on each node
configured to run zookeeper.
33
Deploy Cluster
Deploy the cluster via StackIQ. Resolve any hardware or build issues.
34
Halt Jobs
As defined by your upgrade plan, halt activity on the cluster in the following sequence before you
begin upgrading packages:
1. Notify stakeholders.
2. Stop accepting new jobs.
3. Terminate any running jobs.
The following commands can be used to list and terminate MapReduce jobs:
#hadoopjoblist
#hadoopjobkill<jobid>
#hadoopjobkilltask<taskid>
35
3. Stop zookeeper
$>sudoclushgzkservicemaprzookeeperstop
mapr-cldb
mapr-core
mapr-fileserver
mapr-jobtracker
mapr-metrics
mapr-nfs
mapr-tasktracker
mapr-webserver
mapr-zookeeper
mapr-zk-internal
$>sudoclushayumupdateymaprcldbmaprcoremaprfileservermapr
jobtrackermaprmetricsmaprnfsmaprtasktrackermaprwebservermapr
zookeepermaprzkinternal
36
37
After the cluster comes up, enable the cldb v4 features from the command line
(as root or sudo on a single node)
$>sudomaprcliconfigsavevalues{cldb.v4.features.enabled:1}
over.
The reduction of the on-disk container size will take effect after the CLDB service restarts or fails
The same tests run in week 2 should be run again to validate functionality in
both MR1 and MR2
38