commit 0f4475cfaaf80e34ea5496a93ce579fe8c6950d5 Author: Greg Kroah-Hartman Date: Mon Nov 26 11:40:05 2012 -0800 Linux 3.4.20 commit d0e85e04fb57a65a6096a0b18c97ba5892d676d9 Author: Alex Elder Date: Thu Jun 21 12:49:23 2012 -0700 libceph: drop declaration of ceph_con_get() commit 261030215d970c62f799e6e508e3c68fc7ec2aa9 upstream. For some reason the declaration of ceph_con_get() and ceph_con_put() did not get deleted in this commit: d59315ca libceph: drop ceph_con_get/put helpers and nref member Clean that up. Signed-off-by: Alex Elder Cc: Herton Ronaldo Krzesinski Signed-off-by: Greg Kroah-Hartman commit 51b8318a818623899f9eb24ce697d43301bbe349 Author: Felipe Balbi Date: Tue Oct 16 17:09:22 2012 +0300 Revert "serial: omap: fix software flow control" commit a4f743851f74fc3e0cc40c13082e65c24139f481 upstream. This reverts commit 957ee7270d632245b43f6feb0e70d9a5e9ea6cf6 (serial: omap: fix software flow control). As Russell has pointed out, that commit isn't fixing Software Flow Control at all, and it actually makes it even more broken. It was agreed to revert this commit and use Russell's latest UART patches instead. Signed-off-by: Felipe Balbi Cc: Russell King Acked-by: Tony Lindgren Cc: Andreas Bießmann Signed-off-by: Greg Kroah-Hartman commit 8217df07c9d8debd39a6d8e1e2271e97a3c899c7 Author: Igor Murzov Date: Sat Oct 13 04:41:25 2012 +0400 ACPI video: Ignore errors after _DOD evaluation. commit fba4e087361605d1eed63343bb08811f097c83ee upstream. There are systems where video module known to work fine regardless of broken _DOD and ignoring returned value here doesn't cause any issues later. This should fix brightness controls on some laptops. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47861 Signed-off-by: Igor Murzov Reviewed-by: Sergey V Signed-off-by: Zhang Rui Signed-off-by: Abdallah Chatila commit b7d68a7434bbfd57049927ca713d57e44e0eee04 Author: Alex Elder Date: Tue Oct 2 10:25:51 2012 -0500 ceph: avoid 32-bit page index overflow (cherry picked from commit 6285bc231277419255f3498d3eb5ddc9f8e7fe79) A pgoff_t is defined (by default) to have type (unsigned long). On architectures such as i686 that's a 32-bit type. The ceph address space code was attempting to produce 64 bit offsets by shifting a page's index by PAGE_CACHE_SHIFT, but the result was not what was desired because the shift occurred before the result got promoted to 64 bits. Fix this by converting all uses of page->index used in this way to use the page_offset() macro, which ensures the 64-bit result has the intended value. This fixes http://tracker.newdream.net/issues/3112 Reported-by: Mohamed Pakkeer Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit dfae3b3451c6da14df1fa62d76c8a4345d21bdb2 Author: Sage Weil Date: Mon Sep 24 20:59:48 2012 -0700 libceph: check for invalid mapping (cherry picked from commit d63b77f4c552cc3a20506871046ab0fcbc332609) If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 73bba6fc44591587254fec8e867a99b5a2a28ba7 Author: Yan, Zheng Date: Thu Sep 20 17:42:25 2012 +0800 ceph: Fix oops when handling mdsmap that decreases max_mds (cherry picked from commit 3e8f43a089f06279c5f76a9ccd42578eebf7bfa5) When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return NULL. Passing NULL to memcmp() triggers oops. Signed-off-by: Yan, Zheng Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 631015e45ee3bdcde1fe75e7d04fdfece6e42016 Author: Sage Weil Date: Wed Oct 24 16:12:58 2012 -0700 libceph: avoid NULL kref_put when osd reset races with alloc_msg (cherry picked from commit 9bd952615a42d7e2ce3fa2c632e808e804637a1a) The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a message. If that races with a timeout that resends a zillion messages and resets the connection, and the ->alloc_msg() method returns a NULL message, it will call ceph_msg_put(NULL) and BUG. Fix by only calling put if msg is non-NULL. Fixes http://tracker.newdream.net/issues/3142 Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit a872024581f2e73edbea6eece56361ce508ea881 Author: Alex Elder Date: Mon Oct 8 20:37:30 2012 -0700 rbd: reset BACKOFF if unable to re-queue (cherry picked from commit 588377d6199034c36d335e7df5818b731fea072c) If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 21e292e34c48c03fb6638c2d1295ca12dac97b03 Author: Alex Elder Date: Fri Sep 21 17:59:58 2012 -0500 libceph: only kunmap kmapped pages (cherry picked from commit 5ce765a540f34d1e2005e1210f49f67fdf11e997) In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by: Bryan K. Wright Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 76cb69279f83889cf98fd9f16f5d50bcc2779442 Author: Jim Schutt Date: Fri Aug 10 10:37:38 2012 -0700 libceph: avoid truncation due to racing banners (cherry picked from commit 6d4221b53707486dfad3f5bfe568d2ce7f4c9863) Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, *before* actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by: Jim Schutt Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 523693067608f067bcbe94f23a4feb4dfcfa2db8 Author: Sage Weil Date: Sun Aug 19 12:29:16 2012 -0700 libceph: delay debugfs initialization until we learn global_id (cherry picked from commit d1c338a509cea5378df59629ad47382810c38623) The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng Signed-off-by: Sage Weil Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit b8e03e320f9156e870f8cc66b0d9fca9f24d36c8 Author: Sylvain Munaut Date: Thu Aug 2 09:12:59 2012 -0700 libceph: fix crypto key null deref, memory leak (cherry picked from commit f0666b1ac875ff32fe290219b150ec62eebbe10e) Avoid crashing if the crypto key payload was NULL, as when it was not correctly allocated and initialized. Also, avoid leaking it. Signed-off-by: Sylvain Munaut Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 59238927cc8ea6c1e4a8a1136e17598648832db0 Author: Sage Weil Date: Mon Jul 30 18:19:45 2012 -0700 libceph: recheck con state after allocating incoming message (cherry picked from commit 6139919133377652992a5fe134e22abce3e9c25e) We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 7389a76f02bf56340f26fa933b9fc6a1dece9148 Author: Sage Weil Date: Mon Jul 30 18:19:30 2012 -0700 libceph: change ceph_con_in_msg_alloc convention to be less weird (cherry picked from commit 4740a623d20c51d167da7f752b63e2b8714b2543) This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 328677c24bbae17f3f91ffce3b2351a27933741c Author: Sage Weil Date: Mon Jul 30 18:17:13 2012 -0700 libceph: avoid dropping con mutex before fault (cherry picked from commit 8636ea672f0c5ab7478c42c5b6705ebd1db7eb6a) The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 900fbd910f53a417a6b0859bd2cde7ae48ac4bb2 Author: Sage Weil Date: Mon Jul 30 18:16:56 2012 -0700 libceph: verify state after retaking con lock after dispatch (cherry picked from commit 7b862e07b1a4d5c963d19027f10ea78085f27f9b) We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit da75ae3c0d4c40587fa7583952b98072d811a7d2 Author: Sage Weil Date: Mon Jul 30 18:16:40 2012 -0700 libceph: revoke mon_client messages on session restart (cherry picked from commit 4f471e4a9c7db0256834e1b376ea50c82e345c3c) Revoke all mon_client messages when we shut down the old connection. This is mostly moot since we are re-using the same ceph_connection, but it is cleaner. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 6cdaef1be27bc05ab4197b3ab4e3ee1d326cf04c Author: Sage Weil Date: Mon Jul 30 18:16:16 2012 -0700 libceph: fix handling of immediate socket connect failure (cherry picked from commit 8007b8d626b49c34fb146ec16dc639d8b10c862f) If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 8992551d85934e0dc4683068612758831d1d4899 Author: Sage Weil Date: Fri Jul 20 17:30:40 2012 -0700 libceph: clear all flags on con_close Signed-off-by: Sage Weil (cherry picked from commit 43c7427d100769451601b8a36988ac0528ce0124) commit 63c1362476141f4fb340e8236d41674be9fc1983 Author: Sage Weil Date: Fri Jul 20 17:29:55 2012 -0700 libceph: clean up con flags (cherry picked from commit 4a8616920860920abaa51193146fe36b38ef09aa) Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 265fb7c177f9db75d628b3479b6223c1c8110e67 Author: Sage Weil Date: Fri Jul 20 17:24:40 2012 -0700 libceph: replace connection state bits with states (cherry picked from commit 8dacc7da69a491c515851e68de6036f21b5663ce) Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit cb9f8855591613dff0909c99d46a29e10eb39b25 Author: Sage Weil Date: Fri Jul 20 17:19:43 2012 -0700 libceph: drop unnecessary CLOSED check in socket state change callback (cherry picked from commit d7353dd5aaf22ed611fbcd0d4a4a12fb30659290) If we are CLOSED, the socket is closed and we won't get these. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 3b5a9ead0b4845aaa7e80fcd6166f2c6740a1c6f Author: Sage Weil Date: Fri Jul 20 16:45:49 2012 -0700 libceph: close socket directly from ceph_con_close() (cherry picked from commit ee76e0736db8455e3b11827d6899bd2a4e1d0584) It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 1c5b33b852ab372d4538821db998561d5e4d7212 Author: Sage Weil Date: Fri Jul 20 15:40:04 2012 -0700 libceph: drop gratuitous socket close calls in con_work (cherry picked from commit 2e8cb10063820af7ed7638e3fd9013eee21266e7) If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 5d0f354b3183d005024d43b567335701d928995a Author: Sage Weil Date: Fri Jul 20 15:34:04 2012 -0700 libceph: move ceph_con_send() closed check under the con mutex (cherry picked from commit a59b55a602b6c741052d79c1e3643f8440cddd27) Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 67c6fc1932a705f61f98d8092375622b8dffea6a Author: Sage Weil Date: Fri Jul 20 15:33:04 2012 -0700 libceph: move msgr clear_standby under con mutex protection (cherry picked from commit 00650931e52e97fe64096bec167f5a6780dfd94a) Avoid dropping and retaking con->mutex in the ceph_con_send() case by leaving locking up to the caller. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit da6a81eb5a2671f12551953d1843a0f53416d185 Author: Sage Weil Date: Fri Jul 20 15:22:53 2012 -0700 libceph: fix fault locking; close socket on lossy fault (cherry picked from commit 3b5ede07b55b52c3be27749d183d87257d032065) If we fault on a lossy connection, we should still close the socket immediately, and do so under the con mutex. We should also take the con mutex before printing out the state bits in the debug output. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit ed447f04ee3240371aa8a199588107fd521240d3 Author: Sage Weil Date: Mon Jul 30 16:22:05 2012 -0700 libceph: reset connection retry on successfully negotiation (cherry picked from commit 85effe183dd45854d1ad1a370b88cddb403c4c91) We exponentially back off when we encounter connection errors. If several errors accumulate, we will eventually wait ages before even trying to reconnect. Fix this by resetting the backoff counter after a successful negotiation/ connection with the remote node. Fixes ceph issue #2802. Signed-off-by: Sage Weil Reviewed-by: Yehuda Sadeh Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit eea8ed97626254429d4679c567035effdf9983a8 Author: Sage Weil Date: Mon Jul 30 16:21:40 2012 -0700 libceph: protect ceph_con_open() with mutex (cherry picked from commit 5469155f2bc83bb2c88b0a0370c3d54d87eed06e) Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by: Sage Weil Reviewed-by: Yehuda Sadeh Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 175c4c20d29322b9e882c052025e72644ca00e2d Author: Sage Weil Date: Mon Jul 30 16:20:25 2012 -0700 libceph: (re)initialize bio_iter on start of message receive (cherry picked from commit a4107026976f06c9a6ce8cc84a763564ee39d901) Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - ** bio_iter now non-NULL, not reinitialized ** - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit d43841ef768a78f2cbc7e819f1a2838a4838f566 Author: Sage Weil Date: Mon Jul 30 16:19:28 2012 -0700 libceph: resubmit linger ops when pg mapping changes (cherry picked from commit 6194ea895e447fdf4adfd23f67873a32bf4f15ae) The linger op registration (i.e., watch) modifies the object state. As such, the OSD will reply with success if it has already applied without doing the associated side-effects (setting up the watch session state). If we lose the ACK and resubmit, we will see success but the watch will not be correctly registered and we won't get notifies. To fix this, always resubmit the linger op with a new tid. We accomplish this by re-registering as a linger (i.e., 'registered') if we are not yet registered. Then the second loop will treat this just like a normal case of re-registering. This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and ceph bug #2796. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit 7621822a64ee40c0ab4181e7281ef06f241782cb Author: Sage Weil Date: Mon Jul 30 16:24:37 2012 -0700 libceph: fix mutex coverage for ceph_con_close (cherry picked from commit 8c50c817566dfa4581f82373aac39f3e608a7dc8) Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit b3fd00b73452353444c768b19bb175ec95410c8f Author: Sage Weil Date: Mon Jul 30 16:24:21 2012 -0700 libceph: report socket read/write error message (cherry picked from commit 3a140a0d5c4b9e35373b016e41dfc85f1e526bdb) We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by: Sage Weil Reviewed-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit 59d02721bb2838893596d5617659fe907dd45518 Author: Guanjun He Date: Sun Jul 8 19:50:33 2012 -0700 libceph: prevent the race of incoming work during teardown (cherry picked from commit a2a3258417eb6a1799cf893350771428875a8287) Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by: Guanjun He Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 9beb73fcb83317786226b4203d7e56c6b0f43adb Author: Sage Weil Date: Mon Jul 9 14:22:34 2012 -0700 libceph: initialize msgpool message types (cherry picked from commit d50b409fb8698571d8209e5adfe122e287e31290) Initialize the type field for messages in a msgpool. The caller was doing this for osd ops, but not for the reply messages. Reported-by: Alex Elder Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit d92d11da1dd8531150823ff429ae29a0cf5e438d Author: Sage Weil Date: Wed Jun 27 12:31:02 2012 -0700 libceph: allow sock transition from CONNECTING to CLOSED (cherry picked from commit fbb85a478f6d4cce6942f1c25c6a68ec5b1e7e7f) It is possible to close a socket that is in the OPENING state. For example, it can happen if ceph_con_close() is called on the con before the TCP connection is established. con_work() will come around and shut down the socket. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit a5b0662bb814837139bb73249463e199528101b5 Author: Sage Weil Date: Wed Jun 27 12:24:34 2012 -0700 libceph: initialize mon_client con only once (cherry picked from commit 735a72ef952d42a256f79ae3e6dc1c17a45c041b) Do not re-initialize the con on every connection attempt. When we ceph_con_close, there may still be work queued on the socket (e.g., to close it), and re-initializing will clobber the work_struct state. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 638ba1765d03bdc3a972bfca69fd0a4a4eda717c Author: Sage Weil Date: Wed Jun 27 12:24:08 2012 -0700 libceph: set peer name on con_open, not init (cherry picked from commit b7a9e5dd40f17a48a72f249b8bbc989b63bae5fd) The peer name may change on each open attempt, even when the connection is reused. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit db90f992eb77188ce3e2b95d36f99ba194e04e66 Author: Alex Elder Date: Wed Jun 20 21:53:53 2012 -0500 libceph: add some fine ASCII art (cherry picked from commit bc18f4b1c850ab355e38373fbb60fd28568d84b5) Sage liked the state diagram I put in my commit description so I'm putting it in with the code. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit a8651271bb5d77402251bf35554d1f320463c949 Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: small changes to messenger.c (cherry picked from commit 5821bd8ccdf5d17ab2c391c773756538603838c3) This patch gathers a few small changes in "net/ceph/messenger.c": out_msg_pos_next() - small logic change that mostly affects indentation write_partial_msg_pages(). - use a local variable trail_off to represent the offset into a message of the trail portion of the data (if present) - once we are in the trail portion we will always be there, so we don't always need to check against our data position - avoid computing len twice after we've reached the trail - get rid of the variable tmpcrc, which is not needed - trail_off and trail_len never change so mark them const - update some comments read_partial_message_bio() - bio_iovec_idx() will never return an error, so don't bother checking for it Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 8a0566f6aac534a20ed436e3bed56b736cef4c40 Author: Alex Elder Date: Thu May 24 11:55:03 2012 -0500 libceph: distinguish two phases of connect sequence (cherry picked from commit 7593af920baac37752190a0db703d2732bed4a3b) Currently a ceph connection enters a "CONNECTING" state when it begins the process of (re-)connecting with its peer. Once the two ends have successfully exchanged their banner and addresses, an additional NEGOTIATING bit is set in the ceph connection's state to indicate the connection information exhange has begun. The CONNECTING bit/state continues to be set during this phase. Rather than have the CONNECTING state continue while the NEGOTIATING bit is set, interpret these two phases as distinct states. In other words, when NEGOTIATING is set, clear CONNECTING. That way only one of them will be active at a time. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 05a24ecce18262f17aa0b28a6bf1191a2a456975 Author: Alex Elder Date: Thu May 31 11:37:29 2012 -0500 libceph: separate banner and connect writes (cherry picked from commit ab166d5aa3bc036fba7efaca6e4e43a7e9510acf) There are two phases in the process of linking together the two ends of a ceph connection. The first involves exchanging a banner and IP addresses, and if that is successful a second phase exchanges some detail about each side's connection capabilities. When initiating a connection, the client side now queues to send its information for both phases of this process at the same time. This is probably a bit more efficient, but it is slightly messier from a layering perspective in the code. So rearrange things so that the client doesn't send the connection information until it has received and processed the response in the initial banner phase (in process_banner()). Move the code (in the (con->sock == NULL) case in try_write()) that prepares for writing the connection information, delaying doing that until the banner exchange has completed. Move the code that begins the transition to this second "NEGOTIATING" phase out of process_banner() and into its caller, so preparing to write the connection information and preparing to read the response are adjacent to each other. Finally, preparing to write the connection information now requires the output kvec to be reset in all cases, so move that into the prepare_write_connect() and delete it from all callers. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit a94af04be86f81d5e3973a37e6a861f329418f1e Author: Alex Elder Date: Wed May 23 14:35:23 2012 -0500 libceph: define and use an explicit CONNECTED state (cherry picked from commit e27947c767f5bed15048f4e4dad3e2eb69133697) There is no state explicitly defined when a ceph connection is fully operational. So define one. It's set when the connection sequence completes successfully, and is cleared when the connection gets closed. Be a little more careful when examining the old state when a socket disconnect event is reported. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit abb46df87f784b398bcdb5091175d24456e42f11 Author: Alex Elder Date: Wed May 23 14:35:23 2012 -0500 libceph: clear NEGOTIATING when done (cherry picked from commit 3ec50d1868a9e0493046400bb1fdd054c7f64ebd) A connection state's NEGOTIATING bit gets set while in CONNECTING state after we have successfully exchanged a ceph banner and IP addresses with the connection's peer (the server). But that bit is not cleared again--at least not until another connection attempt is initiated. Instead, clear it as soon as the connection is fully established. Also, clear it when a socket connection gets prematurely closed in the midst of establishing a ceph connection (in case we had reached the point where it was set). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 2a281c7aa696af085db44154d053a169e4b07449 Author: Alex Elder Date: Wed Jun 20 21:53:53 2012 -0500 libceph: clear CONNECTING in ceph_con_close() (cherry picked from commit bb9e6bba5d8b85b631390f8dbe8a24ae1ff5b48a) A connection that is closed will no longer be connecting. So clear the CONNECTING state bit in ceph_con_close(). Similarly, if the socket has been closed we no longer are in connecting state (a new connect sequence will need to be initiated). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 8a62c3339d5455c46c1da1e9fb9b60df20e59905 Author: Alex Elder Date: Wed Jun 20 21:53:53 2012 -0500 libceph: don't touch con state in con_close_socket() (cherry picked from commit 456ea46865787283088b23a8a7f69244513b95f0) In con_close_socket(), a connection's SOCK_CLOSED flag gets set and then cleared while its shutdown method is called and its reference gets dropped. Previously, that flag got set only if it had not already been set, so setting it in con_close_socket() might have prevented additional processing being done on a socket being shut down. We no longer set SOCK_CLOSED in the socket event routine conditionally, so setting that bit here no longer provides whatever benefit it might have provided before. A race condition could still leave the SOCK_CLOSED bit set even after we've issued the call to con_close_socket(), so we still clear that bit after shutting the socket down. Add a comment explaining the reason for this. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 70b06043cd9aebf15319ce3917196ff032dc20dd Author: Alex Elder Date: Wed Jun 20 21:53:53 2012 -0500 libceph: just set SOCK_CLOSED when state changes (cherry picked from commit d65c9e0b9eb43d14ece9dd843506ccba06162ee7) When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED connection flag bit is set, and if it had not been previously set queue_con() is called to ensure con_work() will get a chance to handle the changed state. con_work() atomically checks--and if set, clears--the SOCK_CLOSED bit if it was set. This means that even if the bit were set repeatedly, the related processing in con_work() only gets called once per transition of the bit from 0 to 1. What's important then is that we ensure con_work() gets called *at least* once when a socket close event occurs, not that it gets called *exactly* once. The work queue mechanism already takes care of queueing work only if it is not already queued, so there's no need for us to call queue_con() conditionally. So this patch just makes it so the SOCK_CLOSED flag gets set unconditionally in ceph_sock_state_change(). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 54942c5326f039a40c407a251175cd3925ce2951 Author: Alex Elder Date: Wed Jun 20 21:53:53 2012 -0500 libceph: don't change socket state on sock event (cherry picked from commit 188048bce311ee41e5178bc3255415d0eae28423) Currently the socket state change event handler records an error message on a connection to distinguish a close while connecting from a close while a connection was already established. Changing connection information during handling of a socket event is not very clean, so instead move this assignment inside con_work(), where it can be done during normal connection-level processing (and under protection of the connection mutex as well). Move the handling of a socket closed event up to the top of the processing loop in con_work(); there's no point in handling backoff etc. if we have a newly-closed socket to take care of. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit eb8c5642db57af9577ebfea7ff9e96eadbf596b8 Author: Alex Elder Date: Wed Jun 20 21:53:53 2012 -0500 libceph: SOCK_CLOSED is a flag, not a state (cherry picked from commit a8d00e3cdef4c1c4f194414b72b24cd995439a05) The following commit changed it so SOCK_CLOSED bit was stored in a connection's new "flags" field rather than its "state" field. libceph: start separating connection flags from state commit 928443cd That bit is used in con_close_socket() to protect against setting an error message more than once in the socket event handler function. Unfortunately, the field being operated on in that function was not updated to be "flags" as it should have been. This fixes that error. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit f8920642ec1913872f1af07aa8bcb3570bf6ec77 Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: don't use bio_iter as a flag (cherry picked from commit abdaa6a849af1d63153682c11f5bbb22dacb1f6b) Recently a bug was fixed in which the bio_iter field in a ceph message was not being properly re-initialized when a message got re-transmitted: commit 43643528cce60ca184fe8197efa8e8da7c89a037 Author: Yan, Zheng rbd: Clear ceph_msg->bio_iter for retransmitted message We are now only initializing the bio_iter field when we are about to start to write message data (in prepare_write_message_data()), rather than every time we are attempting to write any portion of the message data (in write_partial_msg_pages()). This means we no longer need to use the msg->bio_iter field as a flag. So just don't do that any more. Trust prepare_write_message_data() to ensure msg->bio_iter is properly initialized, every time we are about to begin writing (or re-writing) a message's bio data. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 67e5007aca705782360373d36613827ebe9e2f36 Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: move init of bio_iter (cherry picked from commit 572c588edadaa3da3992bd8a0fed830bbcc861f8) If a message has a non-null bio pointer, its bio_iter field is initialized in write_partial_msg_pages() if this has not been done already. This is really a one-time setup operation for sending a message's (bio) data, so move that initialization code into prepare_write_message_data() which serves that purpose. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit ec53635e8f11ddf452f69d93eb562c3c5eada75f Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: move init_bio_*() functions up (cherry picked from commit df6ad1f97342ebc4270128222e896541405eecdb) Move init_bio_iter() and iter_bio_next() up in their source file so the'll be defined before they're needed. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 3b17b0bb2b1dcfa3e8aee9ce7ee5c239b1109b72 Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: don't mark footer complete before it is (cherry picked from commit fd154f3c75465abd83b7a395033e3755908a1e6e) This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE flag before the CRC for the data portion (recorded in the footer) has been completely computed. Hold off setting the complete flag until we've decided it's ready to send. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 3c968ed12fad908a5d3676b8d3150b9ac167841c Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: encapsulate advancing msg page (cherry picked from commit 84ca8fc87fcf4ab97bb8acdb59bf97bb4820cb14) In write_partial_msg_pages(), once all the data from a page has been sent we advance to the next one. Put the code that takes care of this into its own function. While modifying write_partial_msg_pages(), make its local variable "in_trail" be Boolean, and use the local variable "msg" (which is just the connection's current out_msg pointer) consistently. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 4ecff48cef44be80eccfa0b3864d55f446006dc5 Author: Alex Elder Date: Mon Jun 11 14:57:13 2012 -0500 libceph: encapsulate out message data setup (cherry picked from commit 739c905baa018c99003564ebc367d93aa44d4861) Move the code that prepares to write the data portion of a message into its own function. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 9021a42c794bf96156be9ad556ef707814a361ff Author: Sage Weil Date: Thu Jun 21 12:49:23 2012 -0700 libceph: drop ceph_con_get/put helpers and nref member (cherry picked from commit d59315ca8c0de00df9b363f94a2641a30961ca1c) These are no longer used. Every ceph_connection instance is embedded in another structure, and refcounts manipulated via the get/put ops. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 1c623b046a6c72666d81afa004f8bf7f70cd4391 Author: Sage Weil Date: Thu Jun 21 12:47:08 2012 -0700 libceph: use con get/put methods (cherry picked from commit 36eb71aa57e6a33d61fd90a2fd87f00c6844bc86) The ceph_con_get/put() helpers manipulate the embedded con ref count, which isn't used now that ceph_connections are embedded in other structures. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 8124d55a2d2c8d738c6ac5b95e6171fe3d1b5af3 Author: Dan Carpenter Date: Tue Jun 19 08:52:33 2012 -0500 libceph: fix NULL dereference in reset_connection() (cherry picked from commit 26ce171915f348abd1f41da1ed139d93750d987f) We dereference "con->in_msg" on the line after it was set to NULL. Signed-off-by: Dan Carpenter Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit fccbf066b35a05606cdb96eb574eaf310c422125 Author: Sage Weil Date: Sat Jun 9 14:19:21 2012 -0700 libceph: transition socket state prior to actual connect (cherry picked from commit 89a86be0ce20022f6ede8bccec078dbb3d63caaa) Once we call ->connect(), we are racing against the actual connection, and a subsequent transition from CONNECTING -> CONNECTED. Set the state to CONNECTING before that, under the protection of the mutex, to avoid the race. This was introduced in 928443cd9644e7cfd46f687dbeffda2d1a357ff9, with the original socket state code. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit f4d29a959aa619f3d46e2b685cfb8acac41503db Author: Xi Wang Date: Wed Jun 6 19:35:55 2012 -0500 libceph: fix overflow in osdmap_apply_incremental() (cherry picked from commit a5506049500b30dbc5edb4d07a3577477c1f3643) On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)' and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad). It would also overflow the subsequent kmalloc() size, leading to out-of-bounds write. Signed-off-by: Xi Wang Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 6b71f61c321b1fe8d35b75ab573ff46c51e59acc Author: Xi Wang Date: Wed Jun 6 19:35:55 2012 -0500 libceph: fix overflow in osdmap_decode() (cherry picked from commit e91a9b639a691e0982088b5954eaafb5a25c8f1c) On 32-bit systems, a large `n' would overflow `n * sizeof(u32)' and bypass the check ceph_decode_need(p, end, n * sizeof(u32), bad). It would also overflow the subsequent kmalloc() size, leading to out-of-bounds write. Signed-off-by: Xi Wang Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit c66a9c7c10955499d96df63ffd87feaee6b01754 Author: Xi Wang Date: Wed Jun 6 19:35:55 2012 -0500 libceph: fix overflow in __decode_pool_names() (cherry picked from commit ad3b904c07dfa88603689bf9a67bffbb9b99beb5) `len' is read from network and thus needs validation. Otherwise a large `len' would cause out-of-bounds access via the memcpy() call. In addition, len = 0xffffffff would overflow the kmalloc() size, leading to out-of-bounds write. This patch adds a check of `len' via ceph_decode_need(). Also use kstrndup rather than kmalloc/memcpy. [elder@inktank.com: added -ENOMEM return for null kstrndup() result] Signed-off-by: Xi Wang Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit ce4516fbb42d2ad5adba4699ebc1703d4e08e821 Author: Alex Elder Date: Fri Jun 1 14:56:43 2012 -0500 libceph: make ceph_con_revoke_message() a msg op (cherry picked from commit 8921d114f5574c6da2cdd00749d185633ecf88f3) ceph_con_revoke_message() is passed both a message and a ceph connection. A ceph_msg allocated for incoming messages on a connection always has a pointer to that connection, so there's no need to provide the connection when revoking such a message. Note that the existing logic does not preclude the message supplied being a null/bogus message pointer. The only user of this interface is the OSD client, and the only value an osd client passes is a request's r_reply field. That is always non-null (except briefly in an error path in ceph_osdc_alloc_request(), and that drops the only reference so the request won't ever have a reply to revoke). So we can safely assume the passed-in message is non-null, but add a BUG_ON() to make it very obvious we are imposing this restriction. Rename the function ceph_msg_revoke_incoming() to reflect that it is really an operation on an incoming message. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit ae048538ab62c31f67d42e00a3183b8870809a3c Author: Alex Elder Date: Fri Jun 1 14:56:43 2012 -0500 libceph: make ceph_con_revoke() a msg operation (cherry picked from commit 6740a845b2543cc46e1902ba21bac743fbadd0dc) ceph_con_revoke() is passed both a message and a ceph connection. Now that any message associated with a connection holds a pointer to that connection, there's no need to provide the connection when revoking a message. This has the added benefit of precluding the possibility of the providing the wrong connection pointer. If the message's connection pointer is null, it is not being tracked by any connection, so revoking it is a no-op. This is supported as a convenience for upper layers, so they can revoke a message that is not actually "in flight." Rename the function ceph_msg_revoke() to reflect that it is really an operation on a message, not a connection. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit bfd357201d3ffc6cc621e4c69fd47e7d457e5f3a Author: Alex Elder Date: Mon Jun 4 14:43:33 2012 -0500 libceph: have messages take a connection reference (cherry picked from commit 92ce034b5a740046cc643a21ea21eaad589e0043) There are essentially two types of ceph messages: incoming and outgoing. Outgoing messages are always allocated via ceph_msg_new(), and at the time of their allocation they are not associated with any particular connection. Incoming messages are always allocated via ceph_con_in_msg_alloc(), and they are initially associated with the connection from which incoming data will be placed into the message. When an outgoing message gets sent, it becomes associated with a connection and remains that way until the message is successfully sent. The association of an incoming message goes away at the point it is sent to an upper layer via a con->ops->dispatch method. This patch implements reference counting for all ceph messages, such that every message holds a reference (and a pointer) to a connection if and only if it is associated with that connection (as described above). For background, here is an explanation of the ceph message lifecycle, emphasizing when an association exists between a message and a connection. Outgoing Messages An outgoing message is "owned" by its allocator, from the time it is allocated in ceph_msg_new() up to the point it gets queued for sending in ceph_con_send(). Prior to that point the message's msg->con pointer is null; at the point it is queued for sending its message pointer is assigned to refer to the connection. At that time the message is inserted into a connection's out_queue list. When a message on the out_queue list has been sent to the socket layer to be put on the wire, it is transferred out of that list and into the connection's out_sent list. At that point it is still owned by the connection, and will remain so until an acknowledgement is received from the recipient that indicates the message was successfully transferred. When such an acknowledgement is received (in process_ack()), the message is removed from its list (in ceph_msg_remove()), at which point it is no longer associated with the connection. So basically, any time a message is on one of a connection's lists, it is associated with that connection. Reference counting outgoing messages can thus be done at the points a message is added to the out_queue (in ceph_con_send()) and the point it is removed from either its two lists (in ceph_msg_remove())--at which point its connection pointer becomes null. Incoming Messages When an incoming message on a connection is getting read (in read_partial_message()) and there is no message in con->in_msg, a new one is allocated using ceph_con_in_msg_alloc(). At that point the message is associated with the connection. Once that message has been completely and successfully read, it is passed to upper layer code using the connection's con->ops->dispatch method. At that point the association between the message and the connection no longer exists. Reference counting of connections for incoming messages can be done by taking a reference to the connection when the message gets allocated, and releasing that reference when it gets handed off using the dispatch method. We should never fail to get a connection reference for a message--the since the caller should already hold one. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit e84e066e5c8c858d3954b2ef1da25c14309e4cef Author: Alex Elder Date: Fri Jun 1 14:56:43 2012 -0500 libceph: have messages point to their connection (cherry picked from commit 38941f8031bf042dba3ced6394ba3a3b16c244ea) When a ceph message is queued for sending it is placed on a list of pending messages (ceph_connection->out_queue). When they are actually sent over the wire, they are moved from that list to another (ceph_connection->out_sent). When acknowledgement for the message is received, it is removed from the sent messages list. During that entire time the message is "in the possession" of a single ceph connection. Keep track of that connection in the message. This will be used in the next patch (and is a helpful bit of information for debugging anyway). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 35067a20685e5f51513c3633256e658fc71e847e Author: Alex Elder Date: Mon Jun 4 14:43:32 2012 -0500 libceph: tweak ceph_alloc_msg() (cherry picked from commit 1c20f2d26795803fc4f5155fe4fca5717a5944b6) The function ceph_alloc_msg() is only used to allocate a message that will be assigned to a connection's in_msg pointer. Rename the function so this implied usage is more clear. In addition, make that assignment inside the function (again, since that's precisely what it's intended to be used for). This allows us to return what is now provided via the passed-in address of a "skip" variable. The return type is now Boolean to be explicit that there are only two possible outcomes. Make sure the result of an ->alloc_msg method call always sets the value of *skip properly. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 6880138c03448b3c375a3d7a8ef6acd688e6fb40 Author: Alex Elder Date: Sat May 26 23:26:43 2012 -0500 libceph: fully initialize connection in con_init() (cherry picked from commit 1bfd89f4e6e1adc6a782d94aa5d4c53be1e404d7) Move the initialization of a ceph connection's private pointer, operations vector pointer, and peer name information into ceph_con_init(). Rearrange the arguments so the connection pointer is first. Hide the byte-swapping of the peer entity number inside ceph_con_init() Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 9403ae33bf946342b23cfe3dbf3e4c9b86860c97 Author: Alex Elder Date: Sat May 26 23:26:43 2012 -0500 libceph: init monitor connection when opening (cherry picked from commit 20581c1faf7b15ae1f8b80c0ec757877b0b53151) Hold off initializing a monitor client's connection until just before it gets opened for use. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit a2b87615e2acfb851ec43603d0061f631381301a Author: Sage Weil Date: Thu May 31 20:27:50 2012 -0700 libceph: drop connection refcounting for mon_client (cherry picked from commit ec87ef4309d33bd9c87a53bb5152a86ae7a65f25) All references to the embedded ceph_connection come from the msgr workqueue, which is drained prior to mon_client destruction. That means we can ignore con refcounting entirely. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 31a84d83433edc79151e28762c1992c0708b222c Author: Alex Elder Date: Sat May 26 23:26:43 2012 -0500 libceph: embed ceph connection structure in mon_client (cherry picked from commit 67130934fb579fdf0f2f6d745960264378b57dc8) A monitor client has a pointer to a ceph connection structure in it. This is the only one of the three ceph client types that do it this way; the OSD and MDS clients embed the connection into their main structures. There is always exactly one ceph connection for a monitor client, so there is no need to allocate it separate from the monitor client structure. So switch the ceph_mon_client structure to embed its ceph_connection structure. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 51588ed26f489e50bfd2359d55abcb4d907149bc Author: Alex Elder Date: Tue May 29 11:04:58 2012 -0500 libceph: set CLOSED state bit in con_init (cherry picked from commit a5988c490ef66cb04ea2f610681949b25c773b3c) Once a connection is fully initialized, it is really in a CLOSED state, so make that explicit by setting the bit in its state field. It is possible for a connection in NEGOTIATING state to get a failure, leading to ceph_fault() and ultimately ceph_con_close(). Clear that bits if it is set in that case, to reflect that the connection truly is closed and is no longer participating in a connect sequence. Issue a warning if ceph_con_open() is called on a connection that is not in CLOSED state. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit d39319ee9b0381848a7e2261d53914e2732191d7 Author: Alex Elder Date: Sat May 26 23:26:43 2012 -0500 libceph: provide osd number when creating osd (cherry picked from commit e10006f807ffc4d5b1d861305d18d9e8145891ca) Pass the osd number to the create_osd() routine, and move the initialization of fields that depend on it therein. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 0bcd15777405bf024a3ec591731582f7263ea1c0 Author: Alex Elder Date: Tue May 22 22:15:49 2012 -0500 libceph: start tracking connection socket state (cherry picked from commit ce2c8903e76e690846a00a0284e4bd9ee954d680) Start explicitly keeping track of the state of a ceph connection's socket, separate from the state of the connection itself. Create placeholder functions to encapsulate the state transitions. -------- | NEW* | transient initial state -------- | con_sock_state_init() v ---------- | CLOSED | initialized, but no socket (and no ---------- TCP connection) ^ \ | \ con_sock_state_connecting() | ---------------------- | \ + con_sock_state_closed() \ |\ \ | \ \ | ----------- \ | | CLOSING | socket event; \ | ----------- await close \ | ^ | | | | | + con_sock_state_closing() | | / \ | | / --------------- | | / \ v | / -------------- | / -----------------| CONNECTING | socket created, TCP | | / -------------- connect initiated | | | con_sock_state_connected() | | v ------------- | CONNECTED | TCP connection established ------------- Make the socket state an atomic variable, reinforcing that it's a distinct transtion with no possible "intermediate/both" states. This is almost certainly overkill at this point, though the transitions into CONNECTED and CLOSING state do get called via socket callback (the rest of the transitions occur with the connection mutex held). We can back out the atomicity later. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit bc327474a0c9f3477be61b2d3e33833ef7b01bf9 Author: Alex Elder Date: Tue May 22 11:41:43 2012 -0500 libceph: start separating connection flags from state (cherry picked from commit 928443cd9644e7cfd46f687dbeffda2d1a357ff9) A ceph_connection holds a mixture of connection state (as in "state machine" state) and connection flags in a single "state" field. To make the distinction more clear, define a new "flags" field and use it rather than the "state" field to hold Boolean flag values. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit d910c114b6da5b78c88889eff1b3f9e83c6f81cb Author: Alex Elder Date: Sat May 26 23:26:43 2012 -0500 libceph: embed ceph messenger structure in ceph_client (cherry picked from commit 15d9882c336db2db73ccf9871ae2398e452f694c) A ceph client has a pointer to a ceph messenger structure in it. There is always exactly one ceph messenger for a ceph client, so there is no need to allocate it separate from the ceph client structure. Switch the ceph_client structure to embed its ceph_messenger structure. Signed-off-by: Alex Elder Reviewed-by: Yehuda Sadeh Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 4874ba9c07e2fa418cd7272d657f5cc883efd35a Author: Alex Elder Date: Wed May 23 14:35:23 2012 -0500 libceph: rename kvec_reset and kvec_add functions (cherry picked from commit e22004235a900213625acd6583ac913d5a30c155) The functions ceph_con_out_kvec_reset() and ceph_con_out_kvec_add() are entirely private functions, so drop the "ceph_" prefix in their name to make them slightly more wieldy. Signed-off-by: Alex Elder Reviewed-by: Yehuda Sadeh Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit f5e79a443054452866bba4856ed243faf502d708 Author: Alex Elder Date: Tue May 22 11:41:43 2012 -0500 libceph: rename socket callbacks (cherry picked from commit 327800bdc2cb9b71f4b458ca07aa9d522668dde0) Change the names of the three socket callback functions to make it more obvious they're specifically associated with a connection's socket (not the ceph connection that uses it). Signed-off-by: Alex Elder Reviewed-by: Yehuda Sadeh Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 809c58f1bd5fa3d3e9ff1d3614c00c1a1239abf1 Author: Alex Elder Date: Tue May 29 21:47:38 2012 -0500 libceph: kill bad_proto ceph connection op (cherry picked from commit 6384bb8b8e88a9c6bf2ae0d9517c2c0199177c34) No code sets a bad_proto method in its ceph connection operations vector, so just get rid of it. Signed-off-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit ac7a42681718cd7474cec70f198f0684ba7444eb Author: Alex Elder Date: Tue May 22 11:41:43 2012 -0500 libceph: eliminate connection state "DEAD" (cherry picked from commit e5e372da9a469dfe3ece40277090a7056c566838) The ceph connection state "DEAD" is never set and is therefore not needed. Eliminate it. Signed-off-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit e7fda85c9dab7396c5ed7345ab7c6fcdf4ffc366 Author: Yan, Zheng Date: Mon May 28 14:44:30 2012 +0800 ceph: check PG_Private flag before accessing page->private (cherry picked from commit 28c0254ede13ab575d2df5c6585ed3d4817c3e6b) I got lots of NULL pointer dereference Oops when compiling kernel on ceph. The bug is because the kernel page migration routine replaces some pages in the page cache with new pages, these new pages' private can be non-zero. Signed-off-by: Zheng Yan Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 9923ad77a69ab565af2746262989cc0340943668 Author: Yan, Zheng Date: Wed Jun 6 09:15:33 2012 -0500 rbd: Fix ceph_snap_context size calculation (cherry picked from commit f9f9a1904467816452fc70740165030e84c2c659) ceph_snap_context->snaps is an u64 array Signed-off-by: Zheng Yan Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 6448acf6e6b603c4fa21697931cd400ebee21d4a Author: Josh Durgin Date: Mon Nov 21 13:04:42 2011 -0800 rbd: store snapshot id instead of index (cherry picked from commit 77dfe99fe3cb0b2b0545e19e2d57b7a9134ee3c0) When a device was open at a snapshot, and snapshots were deleted or added, data from the wrong snapshot could be read. Instead of assuming the snap context is constant, store the actual snap id when the device is initialized, and rely on the OSDs to signal an error if we try reading from a snapshot that was deleted. Signed-off-by: Josh Durgin Reviewed-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit 45739514727c294e843269d515b952d5dbd911bf Author: Josh Durgin Date: Mon Dec 5 10:47:13 2011 -0800 rbd: protect read of snapshot sequence number (cherry picked from commit 403f24d3d51760a8b9368d595fa5f48c309f1a0f) This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by: Josh Durgin Reviewed-by: Alex Elder Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman commit 095cb2142dfb10daea5a3351ed9646d3178a6823 Author: Alex Elder Date: Wed Apr 4 13:35:44 2012 -0500 rbd: don't hold spinlock during messenger flush (cherry picked from commit cd9d9f5df6098c50726200d4185e9e8da32785b3) A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 49da293c7dc4401c2c7963a2c70f633b1c8fa8c5 Author: Sage Weil Date: Tue Jul 10 11:53:34 2012 -0700 libceph: fix messenger retry (cherry picked from commit 5bdca4e0768d3e0f4efa43d9a2cc8210aeb91ab9) In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 21cbad59b07693104dda76ee4afef41302b2b8fb Author: Sage Weil Date: Sun Jun 10 20:43:56 2012 -0700 libceph: flush msgr queue during mon_client shutdown (cherry picked from commit f3dea7edd3d449fe7a6d402c1ce56a294b985261) (cherry picked from commit 642c0dbde32f34baa7886e988a067089992adc8f) We need to flush the msgr workqueue during mon_client shutdown to ensure that any work affecting our embedded ceph_connection is finished so that we can be safely destroyed. Previously, we were flushing the work queue after osd_client shutdown and before mon_client shutdown to ensure that any osd connection refs to authorizers are flushed. Remove the redundant flush, and document in the comment that the mon_client flush is needed to cover that case as well. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 576e428b246271f0f078079c68a9f11679c7db8a Author: Yan, Zheng Date: Wed Jun 6 19:35:55 2012 -0500 rbd: Clear ceph_msg->bio_iter for retransmitted message (cherry picked from commit 43643528cce60ca184fe8197efa8e8da7c89a037) (cherry picked from commit b132cf4c733f91bb4dd2277ea049243cf16e8b66) The bug can cause NULL pointer dereference in write_partial_msg_pages Signed-off-by: Zheng Yan Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit acecca48781a79040dca822cf96d505904c282c3 Author: Sage Weil Date: Thu May 31 20:22:18 2012 -0700 libceph: use con get/put ops from osd_client (cherry picked from commit 0d47766f14211a73eaf54cab234db134ece79f49) There were a few direct calls to ceph_con_{get,put}() instead of the con ops from osd_client.c. This is a bug since those ops aren't defined to be ceph_con_get/put. This breaks refcounting on the ceph_osd structs that contain the ceph_connections, and could lead to all manner of strangeness. The purpose of the ->get and ->put methods in a ceph connection are to allow the connection to indicate it has a reference to something external to the messaging system, *not* to indicate something external has a reference to the connection. [elder@inktank.com: added that last sentence] Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 88ed6ea0b295f8e2383d599a04027ec596cdf97b) commit 40971fcf1578d743cde0272ad20539f5ea34725a Author: Alex Elder Date: Mon Jun 4 14:43:32 2012 -0500 libceph: osd_client: don't drop reply reference too early (cherry picked from commit ab8cb34a4b2f60281a4b18b1f1ad23bc2313d91b) In ceph_osdc_release_request(), a reference to the r_reply message is dropped. But just after that, that same message is revoked if it was in use to receive an incoming reply. Reorder these so we are sure we hold a reference until we're actually done with the message. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 680584fab05efff732b5ae16ad601ba994d7b505) commit 1c201dffa3f4ef8e53dca7bffb8987a43e3e9139 Author: Sage Weil Date: Mon May 21 09:45:23 2012 -0700 libceph: fix pg_temp updates (cherry picked from commit 6bd9adbdf9ca6a052b0b7455ac67b925eb38cfad) Usually, we are adding pg_temp entries or removing them. Occasionally they update. In that case, osdmap_apply_incremental() was failing because the rbtree entry already exists. Fix by removing the existing entry before inserting a new one. Fixes http://tracker.newdream.net/issues/2446 Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 15ba38ebcee664a5a5de57300ee721e0d65b2dd5 Author: Sage Weil Date: Wed May 16 15:16:38 2012 -0500 libceph: avoid unregistering osd request when not registered (cherry picked from commit 35f9f8a09e1e88e31bd34a1e645ca0e5f070dd5c) There is a race between two __unregister_request() callers: the reply path and the ceph_osdc_wait_request(). If we get a reply *and* the timeout expires at roughly the same time, both callers will try to unregister the request, and the second one will do bad things. Simply check if the request is still already unregistered; if so, return immediately and do nothing. Fixes http://tracker.newdream.net/issues/2420 Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 1d3df0e26616d97a798950eb5fb908b14b1839da Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: add auth buf in prepare_write_connect() (cherry picked from commit 3da54776e2c0385c32d143fd497a7f40a88e29dd) Move the addition of the authorizer buffer to a connection's out_kvec out of get_connect_authorizer() and into its caller. This way, the caller--prepare_write_connect()--can avoid adding the connect header to out_kvec before it has been fully initialized. Prior to this patch, it was possible for a connect header to be sent over the wire before the authorizer protocol or buffer length fields were initialized. An authorizer buffer associated with that header could also be queued to send only after the connection header that describes it was on the wire. Fixes http://tracker.newdream.net/issues/2424 Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 3f13447d2c90842c9ea5ceb6b4f8b8b7737f04da Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: rename prepare_connect_authorizer() (cherry picked from commit dac1e716c60161867a47745bca592987ca3a9cb2) Change the name of prepare_connect_authorizer(). The next patch is going to make this function no longer add anything to the connection's out_kvec, so it will no longer fit the pattern of the rest of the prepare_connect_*() functions. In addition, pass the address of a variable that will hold the authorization protocol to use. Move the assignment of that to the connection's out_connect structure into prepare_write_connect(). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 8d19055c84e89980a0e60604bd925bf7cf4cb9e4 Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: return pointer from prepare_connect_authorizer() (cherry picked from commit 729796be9190f57ca40ccca315e8ad34a1eb8fef) Change prepare_connect_authorizer() so it returns a pointer (or pointer-coded error). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit ed35fbcd3cf73dfbff59bf8c20c772925562bc45 Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: use info returned by get_authorizer (cherry picked from commit 8f43fb53894079bf0caab6e348ceaffe7adc651a) Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 4f33c7ed3796a5078cd9eef0d3af4ebf8f7e1b99 Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: have get_authorizer methods return pointers (cherry picked from commit a3530df33eb91d787d08c7383a0a9982690e42d0) Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 83d28f7956228e0dd1774aed1096392d3bfc0597 Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: ensure auth ops are defined before use (cherry picked from commit a255651d4cad89f1a606edd36135af892ada4f20) In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 018a2a13f3cb5e205618b1357124ff25eb3a8223 Author: Alex Elder Date: Wed May 16 15:16:39 2012 -0500 ceph: messenger: reduce args to create_authorizer (cherry picked from commit 74f1869f76d043bad12ec03b4d5f04a8c3d1f157) Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 0f56a54fced6bee6e56a8b84f9adb65a41032866 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: define ceph_auth_handshake type (cherry picked from commit 6c4a19158b96ea1fb8acbe0c1d5493d9dcd2f147) The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 33f0577a991d6d00805450ea29da5a91f6acd1a8 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: messenger: check return from get_authorizer (cherry picked from commit ed96af646011412c2bf1ffe860db170db355fae5) In prepare_connect_authorizer(), a connection's get_authorizer method is called but ignores its return value. This function can return an error, so check for it and return it if that ever occurs. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 29e1a95eb5de3d6745e6eebb7d22dfaea437783c Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: messenger: rework prepare_connect_authorizer() (cherry picked from commit b1c6b9803f5491e94041e6da96bc9dec3870e792) Change prepare_connect_authorizer() so it returns without dropping the connection mutex if the connection has no get_authorizer method. Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning authorization protocols. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 59336e08468a9485e499b1449ebc3d68331233db Author: Alex Elder Date: Wed May 16 21:51:59 2012 -0500 ceph: messenger: check prepare_write_connect() result (cherry picked from commit 5a0f8fdd8a0ebe320952a388331dc043d7e14ced) prepare_write_connect() can return an error, but only one of its callers checks for it. All the rest are in functions that already return errors, so it should be fine to return the error if one gets returned. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 59a521ebc05d2bd826d19c4a9a6fb00c313f6a52 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: don't set WRITE_PENDING too early (cherry picked from commit e10c758e4031a801ea4d2f8fb39bf14c2658d74b) prepare_write_connect() prepares a connect message, then sets WRITE_PENDING on the connection. Then *after* this, it calls prepare_connect_authorizer(), which updates the content of the connection buffer already queued for sending. It's also possible it will result in prepare_write_connect() returning -EAGAIN despite the WRITE_PENDING big getting set. Fix this by preparing the connect authorizer first, setting the WRITE_PENDING bit only after that is done. Partially addresses http://tracker.newdream.net/issues/2424 Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 7dd07ab6bdc7634e99747557d6ba342b639b13d9 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: drop msgr argument from prepare_write_connect() (cherry picked from commit e825a66df97776d30a48a187e3a986736af43945) In all cases, the value passed as the msgr argument to prepare_write_connect() is just con->msgr. Just get the msgr value from the ceph connection and drop the unneeded argument. The only msgr passed to prepare_write_banner() is also therefore just the one from con->msgr, so change that function to drop the msgr argument as well. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit d4ac74c3222ae545e62bb1ab2bd7306d71ba5379 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: messenger: send banner in process_connect() (cherry picked from commit 41b90c00858129f52d08e6a05c9cfdb0f2bd074d) prepare_write_connect() has an argument indicating whether a banner should be sent out before sending out a connection message. It's only ever set in one of its callers, so move the code that arranges to send the banner into that caller and drop the "include_banner" argument from prepare_write_connect(). Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 726644c635924f48f1bc9f64087e92915bc61fc7 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 ceph: messenger: reset connection kvec caller (cherry picked from commit 84fb3adf6413862cff51d8af3fce5f0b655586a2) Reset a connection's kvec fields in the caller rather than in prepare_write_connect(). This ends up repeating a few lines of code but it's improving the separation between distinct operations on the connection, which we can take advantage of later. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 6b7cc02849d47abf3077e27aec841dbd140b8b35 Author: Alex Elder Date: Wed May 16 15:16:38 2012 -0500 libceph: don't reset kvec in prepare_write_banner() (cherry picked from commit d329156f16306449c273002486c28de3ddddfd89) Move the kvec reset for a connection out of prepare_write_banner and into its only caller. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 1f7631fba2a763f45bfec34bdf731e2089ca3d87 Author: Alex Elder Date: Thu May 10 10:29:50 2012 -0500 ceph: messenger: change read_partial() to take "end" arg (cherry picked from commit fd51653f78cf40a0516e521b6de22f329c5bad8d) Make the second argument to read_partial() be the ending input byte position rather than the beginning offset it now represents. This amounts to moving the addition "to + size" into the caller. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit dd22ce515dc6f8a69957473f1ce0d0d8f043b06a Author: Alex Elder Date: Thu May 10 10:29:50 2012 -0500 ceph: messenger: update "to" in read_partial() caller (cherry picked from commit e6cee71fac27c946a0bbad754dd076e66c4e9dbd) read_partial() always increases whatever "to" value is supplied by adding the requested size to it, and that's the only thing it does with that pointed-to value. Do that pointer advance in the caller (and then only when the updated value will be subsequently used), and change the "to" parameter to be an in-only and non-pointer value. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit d0d7af68e9da4d8ad24052151d5d4add6464f505 Author: Alex Elder Date: Thu May 10 10:29:50 2012 -0500 ceph: messenger: use read_partial() in read_partial_message() (cherry picked from commit 57dac9d1620942608306d8c17c98a9d1568ffdf4) There are two blocks of code in read_partial_message()--those that read the header and footer of the message--that can be replaced by a call to read_partial(). Do that. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit f77637d9f603dd74cf1b4366d7bb1938a6715ab2 Author: Alex Elder Date: Fri Apr 20 15:49:43 2012 -0500 ceph: osd_client: fix endianness bug in osd_req_encode_op() (cherry picked from commit 065a68f9167e20f321a62d044cb2c3024393d455) From Al Viro Al Viro noticed that we were using a non-cpu-encoded value in a switch statement in osd_req_encode_op(). The result would clearly not work correctly on a big-endian machine. Signed-off-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit cf34fc7d48d9600665e570b5c8a297a52bbe5fc5 Author: Sage Weil Date: Mon May 7 15:37:05 2012 -0700 crush: fix memory leak when destroying tree buckets (cherry picked from commit 6eb43f4b5a2a74599b4ff17a97c03a342327ca65) Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4. Reported-by: Alexander Lyakas Reviewed-by: Alex Elder Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 9a0117ae53308d0d8284ba5664ed4c1d0ec54176 Author: Sage Weil Date: Mon May 7 15:36:49 2012 -0700 crush: fix tree node weight lookup (cherry picked from commit f671d4cd9b36691ac4ef42cde44c1b7a84e13631) Fix the node weight lookup for tree buckets by using a correct accessor. Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733. Reviewed-by: Alex Elder Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 20501b9e6e1db8e7ab6668ef15d697b1c057a50a Author: Sage Weil Date: Mon May 7 15:35:24 2012 -0700 crush: be more tolerant of nonsensical crush maps (cherry picked from commit a1f4895be8bf1ba56c2306b058f51619e9b0e8f8) If we get a map that doesn't make sense, error out or ignore the badness instead of BUGging out. This reflects the ceph.git commits 9895f0bff7dc68e9b49b572613d242315fb11b6c and 8ded26472058d5205803f244c2f33cb6cb10de79. Reviewed-by: Alex Elder Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 9926776ca573b23914d5265457202307a3715748 Author: Sage Weil Date: Mon May 7 15:35:09 2012 -0700 crush: adjust local retry threshold (cherry picked from commit c90f95ed46393e29d843686e21947d1c6fcb1164) This small adjustment reflects a change that was made in ceph.git commit af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago. An N-1 search is not exhaustive. Fixed ceph.git bug #1594. Reviewed-by: Alex Elder Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 506b4672ace55889c16d4e9d5515e0c1ae7832d5 Author: Sage Weil Date: Mon May 7 15:38:35 2012 -0700 crush: clean up types, const-ness (cherry picked from commit 8b12d47b80c7a34dffdd98244d99316db490ec58) Move various types from int -> __u32 (or similar), and add const as appropriate. This reflects changes that have been present in the userland implementation for some time. Reviewed-by: Alex Elder Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit 55649211861616c26aa25c9e710c5691837975e4 Author: Dave Jones Date: Thu Nov 8 16:09:27 2012 -0800 selinux: fix sel_netnode_insert() suspicious rcu dereference commit 88a693b5c1287be4da937699cb82068ce9db0135 upstream. =============================== [ INFO: suspicious RCU usage. ] 3.5.0-rc1+ #63 Not tainted ------------------------------- security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by trinity-child1/8750: #0: (sel_netnode_lock){+.....}, at: [] sel_netnode_sid+0x16a/0x3e0 stack backtrace: Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63 Call Trace: [] lockdep_rcu_suspicious+0xfd/0x130 [] sel_netnode_sid+0x3b1/0x3e0 [] ? sel_netnode_find+0x1a0/0x1a0 [] selinux_socket_bind+0xf6/0x2c0 [] ? trace_hardirqs_off+0xd/0x10 [] ? lock_release_holdtime.part.9+0x15/0x1a0 [] ? lock_hrtimer_base+0x31/0x60 [] security_socket_bind+0x16/0x20 [] sys_bind+0x7a/0x100 [] ? sysret_check+0x22/0x5d [] ? trace_hardirqs_on_caller+0x10d/0x1a0 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b This patch below does what Paul McKenney suggested in the previous thread. Signed-off-by: Dave Jones Reviewed-by: Paul E. McKenney Acked-by: Paul Moore Cc: Eric Paris Signed-off-by: Andrew Morton Signed-off-by: James Morris Signed-off-by: Greg Kroah-Hartman commit a23d6310a6fbe4a2a1d3a40251a6d5b8ae39ec22 Author: Jan Kara Date: Tue Nov 13 18:25:38 2012 +0100 reiserfs: Protect reiserfs_quota_write() with write lock commit 361d94a338a3fd0cee6a4ea32bbc427ba228e628 upstream. Calls into reiserfs journalling code and reiserfs_get_block() need to be protected with write lock. We remove write lock around calls to high level quota code in the next patch so these paths would suddently become unprotected. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit 8c7dcc4819098b0166e4559fbfd6fa661fbb5755 Author: Jan Kara Date: Tue Nov 13 17:05:14 2012 +0100 reiserfs: Move quota calls out of write lock commit 7af11686933726e99af22901d622f9e161404e6b upstream. Calls into highlevel quota code cannot happen under the write lock. These calls take dqio_mutex which ranks above write lock. So drop write lock before calling back into quota code. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit 8ed4d1ceb26e7f1011314a4f2db93897fb8949e2 Author: Jan Kara Date: Tue Nov 13 16:34:17 2012 +0100 reiserfs: Protect reiserfs_quota_on() with write lock commit b9e06ef2e8706fe669b51f4364e3aeed58639eb2 upstream. In reiserfs_quota_on() we do quite some work - for example unpacking tail of a quota file. Thus we have to hold write lock until a moment we call back into the quota code. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit 394cbbc4417c009f2831122e9d64c975c8249123 Author: Jan Kara Date: Tue Nov 13 14:55:52 2012 +0100 reiserfs: Fix lock ordering during remount commit 3bb3e1fc47aca554e7e2cc4deeddc24750987ac2 upstream. When remounting reiserfs dquot_suspend() or dquot_resume() can be called. These functions take dqonoff_mutex which ranks above write lock so we have to drop it before calling into quota code. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit 824904ce9efd5665024b03217372ca82170687d2 Author: Bryan Schumaker Date: Tue Oct 30 16:06:35 2012 -0400 NFS: Wait for session recovery to finish before returning commit 399f11c3d872bd748e1575574de265a6304c7c43 upstream. Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker Signed-off-by: Trond Myklebust [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman commit d1b28a26ed24cec32801bfddb48796f914f48d47 Author: Daniel Vetter Date: Mon Oct 22 12:55:55 2012 +0200 drm/i915: fix overlay on i830M commit a9193983f4f292a82a00c72971c17ec0ee8c6c15 upstream. The overlay on the i830M has a peculiar failure mode: It works the first time around after boot-up, but consistenly hangs the second time it's used. Chris Wilson has dug out a nice errata: "1.5.12 Clock Gating Disable for Display Register Address Offset: 06200h–06203h "Bit 3 Ovrunit Clock Gating Disable. 0 = Clock gating controlled by unit enabling logic 1 = Disable clock gating function DevALM Errata ALM049: Overlay Clock Gating Must be Disabled: Overlay & L2 Cache clock gating must be disabled in order to prevent device hangs when turning off overlay.SW must turn off Ovrunit clock gating (6200h) and L2 Cache clock gating (C8h)." Now I've nowhere found that 0xc8 register and hence couldn't apply the l2 cache workaround. But I've remembered that part of the magic that the OVERLAY_ON/OFF commands are supposed to do is to rearrange cache allocations so that the overlay scaler has some scratch space. And while pondering how that could explain the hang the 2nd time we enable the overlay, I've remembered that the old ums overlay code did _not_ issue the OVERLAY_OFF cmd. And indeed, disabling the OFF cmd results in the overlay working flawlessly, so I guess we can workaround the lack of the above workaround by simply never disabling the overlay engine once it's enabled. Note that we have the first part of the above w/a already implemented in i830_init_clock_gating - leave that as-is to avoid surprises. v2: Add a comment in the code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47827 Tested-by: Rhys Reviewed-by: Chris Wilson Signed-off-by: Daniel Vetter [bwh: Backported to 3.2: - Adjust context - s/intel_ring_emit(ring, /OUT_RING(/] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman commit 37a42f991f36aae9b064bc3f39b760006bd17131 Author: Martin Schwidefsky Date: Wed Nov 7 10:44:08 2012 +0100 s390/signal: set correct address space control commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream. If user space is running in primary mode it can switch to secondary or access register mode, this is used e.g. in the clock_gettime code of the vdso. If a signal is delivered to the user space process while it has been running in access register mode the signal handler is executed in access register mode as well which will result in a crash most of the time. Set the address space control bits in the PSW to the default for the execution of the signal handler and make sure that the previous address space control is restored on signal return. Take care that user space can not switch to the kernel address space by modifying the registers in the signal frame. Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 39220c9e67cbfbc1a7ac17a51afd77eb06b62fa9 Author: Mirko Lindner Date: Tue Jul 3 23:38:46 2012 +0000 sky2: Fix for interrupt handler commit d663d181b9e92d80c2455e460e932d34e7a2a7ae upstream. Re-enable interrupts if it is not our interrupt Signed-off-by: Mirko Lindner Signed-off-by: David S. Miller Cc: Jonathan Nieder Signed-off-by: Greg Kroah-Hartman commit 122fd46fccafa6d839118815141179f1a90cfb48 Author: Tim Sally Date: Thu Jul 12 19:10:24 2012 -0400 eCryptfs: check for eCryptfs cipher support at mount commit 5f5b331d5c21228a6519dcb793fc1629646c51a6 upstream. The issue occurs when eCryptfs is mounted with a cipher supported by the crypto subsystem but not by eCryptfs. The mount succeeds and an error does not occur until a write. This change checks for eCryptfs cipher support at mount time. Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009. https://bugs.launchpad.net/ecryptfs/+bug/338914 Signed-off-by: Tim Sally Signed-off-by: Tyler Hicks Cc: Herton Ronaldo Krzesinski Signed-off-by: Greg Kroah-Hartman commit c8a1ae7c0030e936470a4701a46f775a2bc7012b Author: Tyler Hicks Date: Mon Jun 11 15:42:32 2012 -0700 eCryptfs: Copy up POSIX ACL and read-only flags from lower mount commit 069ddcda37b2cf5bb4b6031a944c0e9359213262 upstream. When the eCryptfs mount options do not include '-o acl', but the lower filesystem's mount options do include 'acl', the MS_POSIXACL flag is not flipped on in the eCryptfs super block flags. This flag is what the VFS checks in do_last() when deciding if the current umask should be applied to a newly created inode's mode or not. When a default POSIX ACL mask is set on a directory, the current umask is incorrectly applied to new inodes created in the directory. This patch ignores the MS_POSIXACL flag passed into ecryptfs_mount() and sets the flag on the eCryptfs super block depending on the flag's presence on the lower super block. Additionally, it is incorrect to allow a writeable eCryptfs mount on top of a read-only lower mount. This missing check did not allow writes to the read-only lower mount because permissions checks are still performed on the lower filesystem's objects but it is best to simply not allow a rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw mount is valid and still allowed. https://launchpad.net/bugs/1009207 Signed-off-by: Tyler Hicks Reported-by: Stefan Beller Cc: John Johansen Cc: Herton Ronaldo Krzesinski Signed-off-by: Greg Kroah-Hartman commit 9dbd3418ff461025e904e65326aeeb5389f65799 Author: Jan Safrata Date: Tue May 22 14:04:50 2012 +0200 usb: use usb_serial_put in usb_serial_probe errors commit 0658a3366db7e27fa32c12e886230bb58c414c92 upstream. The use of kfree(serial) in error cases of usb_serial_probe was invalid - usb_serial structure allocated in create_serial() gets reference of usb_device that needs to be put, so we need to use usb_serial_put() instead of simple kfree(). Signed-off-by: Jan Safrata Acked-by: Johan Hovold Cc: Richard Retanubun Signed-off-by: Greg Kroah-Hartman commit a39bdce2f2a9aebda1b9438c4b2e91c0dd507a34 Author: Ulrich Weber Date: Thu Oct 25 05:34:45 2012 +0000 netfilter: nf_nat: don't check for port change on ICMP tuples commit 38fe36a248ec3228f8e6507955d7ceb0432d2000 upstream. ICMP tuples have id in src and type/code in dst. So comparing src.u.all with dst.u.all will always fail here and ip_xfrm_me_harder() is called for every ICMP packet, even if there was no NAT. Signed-off-by: Ulrich Weber Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit b3e991ea9222c3ec71d74b37d105cea115055c4d Author: Jozsef Kadlecsik Date: Fri Aug 31 09:55:53 2012 +0000 netfilter: Mark SYN/ACK packets as invalid from original direction commit 64f509ce71b08d037998e93dd51180c19b2f464c upstream. Clients should not send such packets. By accepting them, we open up a hole by wich ephemeral ports can be discovered in an off-path attack. See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel, http://arxiv.org/abs/1201.2074 Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit c581c7c77d5685829bf620dc6e194e2aa2afae00 Author: Jozsef Kadlecsik Date: Fri Aug 31 09:55:54 2012 +0000 netfilter: Validate the sequence number of dataless ACK packets as well commit 4a70bbfaef0361d27272629d1a250a937edcafe4 upstream. We spare nothing by not validating the sequence number of dataless ACK packets and enabling it makes harder off-path attacks. See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel, http://arxiv.org/abs/1201.2074 Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 1b10e0be50067689f53e566d5e1cfad6170e89c3 Author: Nathan Walp Date: Thu Nov 1 12:08:47 2012 +0000 r8169: allow multicast packets on sub-8168f chipset. commit 0481776b7a70f09acf7d9d97c288c3a8403fbfe4 upstream. RTL_GIGA_MAC_VER_35 includes no multicast hardware filter. Signed-off-by: Nathan Walp Suggested-by: Hayes Wang Acked-by: Francois Romieu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cfc2c996141d487273c63af8954a3f3f03b16229 Author: Cyril Brulebois Date: Wed Oct 31 14:00:46 2012 +0000 r8169: Fix WoL on RTL8168d/8111d. commit b00e69dee4ccbb3a19989e3d4f1385bc2e3406cd upstream. This regression was spotted between Debian squeeze and Debian wheezy kernels (respectively based on 2.6.32 and 3.2). More info about Wake-on-LAN issues with Realtek's 816x chipsets can be found in the following thread: http://marc.info/?t=132079219400004 Probable regression from d4ed95d796e5126bba51466dc07e287cebc8bd19; more chipsets are likely affected. Tested on top of a 3.2.23 kernel. Reported-by: Florent Fourcot Tested-by: Florent Fourcot Hinted-by: Francois Romieu Signed-off-by: Cyril Brulebois Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f5a93eaffacc75908abcb9b94e2367a824d5953f Author: Mojiong Qiu Date: Tue Nov 6 16:08:15 2012 +0800 xen/events: fix RCU warning, or Call idle notifier after irq_enter() commit 772aebcefeff310f80e32b874988af0076cb799d upstream. exit_idle() should be called after irq_enter(), otherwise it throws: [ INFO: suspicious RCU usage. ] 3.6.5 #1 Not tainted ------------------------------- include/linux/rcupdate.h:725 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 1 RCU used illegally from extended quiescent state! 1 lock held by swapper/0/0: #0: (rcu_read_lock){......}, at: [] __atomic_notifier_call_chain+0x0/0x140 stack backtrace: Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1 Call Trace: [] lockdep_rcu_suspicious+0xe2/0x130 [] __atomic_notifier_call_chain+0x12c/0x140 [] ? atomic_notifier_chain_unregister+0x90/0x90 [] ? trace_hardirqs_off+0xd/0x10 [] atomic_notifier_call_chain+0x16/0x20 [] exit_idle+0x43/0x50 [] xen_evtchn_do_upcall+0x25/0x50 [] xen_do_hypervisor_callback+0x1e/0x30 [] ? hypercall_page+0x3aa/0x1000 [] ? hypercall_page+0x3aa/0x1000 [] ? xen_safe_halt+0x10/0x20 [] ? default_idle+0xba/0x570 [] ? cpu_idle+0xdf/0x140 [] ? rest_init+0x135/0x144 [] ? csum_partial_copy_generic+0x16c/0x16c [] ? start_kernel+0x3db/0x3e8 [] ? repair_env_string+0x5a/0x5a [] ? x86_64_start_reservations+0x131/0x135 [] ? xen_start_kernel+0x465/0x46 Git commit 98ad1cc14a5c4fd658f9d72c6ba5c86dfd3ce0d5 Author: Frederic Weisbecker Date: Fri Oct 7 18:22:09 2011 +0200 x86: Call idle notifier after irq_enter() did this, but it missed the Xen code. Signed-off-by: Mojiong Qiu Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 56cb8f7e4324410e9acb0b307efd2f846c06cf69 Author: Michal Schmidt Date: Sun Sep 9 13:55:26 2012 +0000 r8169: use unlimited DMA burst for TX commit aee77e4accbeb2c86b1d294cd84fec4a12dde3bd upstream. The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have a box where this prevents the interface from using the gigabit line to its full potential. This patch solves the problem by setting TX_DMA_BURST to unlimited. The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl (XID 0c900880). TSO is enabled. I used netperf (TCP_STREAM test) to measure the dependency of TX throughput on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024, '7'=unlimited). This chart shows the results: http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png Interesting points: - With the current DMA burst limit (1024): - at the default MTU=1500 I get only 842 Mbit/s. - when going from small MTU, the performance rises monotonically with increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's a sudden drop to 762 MBit/s from which the throughput rises monotonically again with further MTU increases. - With a smaller DMA burst limit (512): - there's a similar peak at MTU=1076 and another one at MTU=564. - With unlimited DMA burst: - at the default MTU=1500 I get nice 940 Mbit/s. - the throughput rises monotonically with increasing MTU with no strange peaks. Notice that the peaks occur at MTU sizes that are multiples of the DMA burst limit plus 52. Why 52? Because: 20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52 The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too, except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes. CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B", i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs the smaller burst limit, or if any other versions have similar requirements. Signed-off-by: Michal Schmidt Acked-by: Francois Romieu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ec5924204b7fa09a679c4456ac8e2176a2950ce6 Author: Hugh Dickins Date: Fri Nov 16 14:15:04 2012 -0800 tmpfs: change final i_blocks BUG to WARNING commit 0f3c42f522dc1ad7e27affc0a4aa8c790bce0a66 upstream. Under a particular load on one machine, I have hit shmem_evict_inode()'s BUG_ON(inode->i_blocks), enough times to narrow it down to a particular race between swapout and eviction. It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(), and the lack of coherent locking between mapping's nrpages and shmem's swapped count. There's a window in shmem_writepage(), between lowering nrpages in shmem_delete_from_page_cache() and then raising swapped count, when the freed count appears to be +1 when it should be 0, and then the asymmetry stops it from being corrected with -1 before hitting the BUG. One answer is coherent locking: using tree_lock throughout, without info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on used_blocks makes that messier than expected. Another answer may be a further effort to eliminate the weird shmem_recalc_inode() altogether, but previous attempts at that failed. So far undecided, but for now change the BUG_ON to WARN_ON: in usual circumstances it remains a useful consistency check. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f86c309e326f800e8fb028df35051aac5a26ef8d Author: Tom Herbert Date: Fri Nov 16 09:04:15 2012 +0000 net-rps: Fix brokeness causing OOO packets [ Upstream commit baefa31db2f2b13a05d1b81bdf2d20d487f58b0a ] In commit c445477d74ab3779 which adds aRFS to the kernel, the CPU selected for RFS is not set correctly when CPU is changing. This is causing OOO packets and probably other issues. Signed-off-by: Tom Herbert Acked-by: Eric Dumazet Acked-by: Ben Hutchings Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4e95708469b3b36b123c9ebfe4fed2b9de601e7a Author: Jiri Pirko Date: Wed Nov 14 02:51:04 2012 +0000 net: correct check in dev_addr_del() [ Upstream commit a652208e0b52c190e57f2a075ffb5e897fe31c3b ] Check (ha->addr == dev->dev_addr) is always true because dev_addr_init() sets this. Correct the check to behave properly on addr removal. Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 798f49e632f38abcb51dea6014b078798c1cc6ee Author: Hannes Frederic Sowa Date: Sat Nov 10 19:52:34 2012 +0000 ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value [ Upstream commit d4596bad2a713fcd0def492b1960e6d899d5baa8 ] Cc: Stephen Hemminger Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0fa89335e494872552e8e1d4e594c18559922e5c Author: Xi Wang Date: Sun Nov 11 11:20:01 2012 +0000 ipv4: avoid undefined behavior in do_ip_setsockopt() [ Upstream commit 0c9f79be295c99ac7e4b569ca493d75fdcc19e4e ] (1< Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 999d6a5194c9707b081869f40306ac5e1727edf4 Author: Andreas Schwab Date: Sat Nov 17 22:27:04 2012 +0100 m68k: fix sigset_t accessor functions commit 34fa78b59c52d1db3513db4c1a999db26b2e9ac2 upstream. The sigaddset/sigdelset/sigismember functions that are implemented with bitfield insn cannot allow the sigset argument to be placed in a data register since the sigset is wider than 32 bits. Remove the "d" constraint from the asm statements. The effect of the bug is that sending RT signals does not work, the signal number is truncated modulo 32. Signed-off-by: Andreas Schwab Signed-off-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman commit dd361b498741bd9b23a2f2238cf1c6af6cd49b67 Author: Johannes Berg Date: Mon Nov 12 10:51:34 2012 +0100 wireless: allow 40 MHz on world roaming channels 12/13 commit 43c771a1963ab461a2f194e3c97fded1d5fe262f upstream. When in world roaming mode, allow 40 MHz to be used on channels 12 and 13 so that an AP that is, e.g., using HT40+ on channel 9 (in the UK) can be used. Reported-by: Eddie Chapman Tested-by: Eddie Chapman Acked-by: Luis R. Rodriguez Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit e5c4ee6a081ec04e8ba54b00a4385391ab77e2f8 Author: Michal Hocko Date: Fri Nov 16 14:14:49 2012 -0800 memcg: oom: fix totalpages calculation for memory.swappiness==0 commit 9a5a8f19b43430752067ecaee62fc59e11e88fa6 upstream. oom_badness() takes a totalpages argument which says how many pages are available and it uses it as a base for the score calculation. The value is calculated by mem_cgroup_get_limit which considers both limit and total_swap_pages (resp. memsw portion of it). This is usually correct but since fe35004fbf9e ("mm: avoid swapping out with swappiness==0") we do not swap when swappiness is 0 which means that we cannot really use up all the totalpages pages. This in turn confuses oom score calculation if the memcg limit is much smaller than the available swap because the used memory (capped by the limit) is negligible comparing to totalpages so the resulting score is too small if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj). A wrong process might be selected as result. The problem can be worked around by checking mem_cgroup_swappiness==0 and not considering swap at all in such a case. Signed-off-by: Michal Hocko Acked-by: David Rientjes Acked-by: Johannes Weiner Acked-by: KOSAKI Motohiro Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 9098e8785685bd245027447b5628fa872babd3b8 Author: Zhao Yakui Date: Tue Nov 13 18:31:55 2012 +0000 ttm: Clear the ttm page allocated from high memory zone correctly commit ac207ed2471150e06af0afc76e4becc701fa2733 upstream. The TTM page can be allocated from high memory. In such case it is wrong to use the page_address(page) as the virtual address for the high memory page. bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=50241 Signed-off-by: Zhao Yakui Reviewed-by: Thomas Hellstrom Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit cfb62e2f3c407c91a4c4707006ad47183baf4ff1 Author: Alex Deucher Date: Wed Nov 14 09:10:39 2012 -0500 drm/radeon: fix logic error in atombios_encoders.c commit b9196395c905edec512dfd6690428084228c16ec upstream. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=50431 Reported-by: David Binderman Signed-off-by: Alex Deucher Reviewed-by: Michel Dänzer Signed-off-by: Greg Kroah-Hartman commit 3bf0d107e137043aa3c6a872ee542b3f0e85a30a Author: Dan Williams Date: Thu Nov 8 11:56:53 2012 -0600 USB: option: add Alcatel X220/X500D USB IDs commit c0bc3098871dd9b964f6b45ec1e4d70d87811744 upstream. Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit 47ccee3020fc998e27dbdd04cc575bfb6b927886 Author: Dan Williams Date: Thu Nov 8 11:56:42 2012 -0600 USB: option: add Novatel E362 and Dell Wireless 5800 USB IDs commit fcb21645f1bd86d2be29baf48aa1b298de52ccc7 upstream. The Dell 5800 appears to be a simple rebrand of the Novatel E362. Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit 3e31ee155e0963e42bb02d33a2a9e44d5b91a45b Author: Heiko Carstens Date: Mon Oct 22 15:49:02 2012 +0200 s390/gup: add missing TASK_SIZE check to get_user_pages_fast() commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream. When walking page tables we need to make sure that everything is within bounds of the ASCE limit of the task's address space. Otherwise we might calculate e.g. a pud pointer which is not within a pud and dereference it. So check against TASK_SIZE (which is the ASCE limit) before walking page tables. Reviewed-by: Gerald Schaefer Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 87dd2c484ae11e2e0a34996d96bfbfaf20f57b45 Author: Colin Cross Date: Wed Nov 7 18:21:51 2012 -0800 Revert "Staging: Android alarm: IOCTL command encoding fix" commit d38e0e3fed4f58bcddef4dc93a591dfe2f651cb0 upstream. Commit 6bd4a5d96c08dc2380f8053b1bd4f879f55cd3c9 changed the ANDROID_ALARM_GET_TIME ioctls from IOW to IOR. While technically correct, the _IOC_DIR bits are ignored by alarm_ioctl, so the commit breaks a userspace ABI used by all existing Android devices for a purely cosmetic reason. Revert it. Cc: Dae S. Kim Signed-off-by: Colin Cross Signed-off-by: Greg Kroah-Hartman commit 07126218d445ea1fea9952c6f801e5d05117e223 Author: Artem Bityutskiy Date: Wed Oct 10 10:55:28 2012 +0300 UBIFS: introduce categorized lprops counter commit 98a1eebda3cb2a84ecf1f219bb3a95769033d1bf upstream. This commit is a preparation for a subsequent bugfix. We introduce a counter for categorized lprops. Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman commit 151de639411e60bbf9d8fbb5e7b78bb2c5eb0972 Author: Artem Bityutskiy Date: Tue Oct 9 16:20:15 2012 +0300 UBIFS: fix mounting problems after power cuts commit a28ad42a4a0c6f302f488f26488b8b37c9b30024 upstream. This is a bugfix for a problem with the following symptoms: 1. A power cut happens 2. After reboot, we try to mount UBIFS 3. Mount fails with "No space left on device" error message UBIFS complains like this: UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB The root cause of this problem is that when we mount, not all LEBs are categorized. Only those which were read are. However, the 'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were categorized and 'c->freeable_cnt' is valid, which is a false assumption. This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()' to always fall back to LPT scanning if no freeable LEBs were found. This problem was reported by few people in the past, but Brent Taylor was able to reproduce it and send me a flash image which cannot be mounted, which made it easy to hunt the bug. Kudos to Brent. Reported-by: Brent Taylor Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman commit 647a9dbec6ab9bf9789b2dcfb589cf4dbb13e379 Author: Misael Lopez Cruz Date: Thu Nov 8 12:03:12 2012 -0600 ASoC: dapm: Use card_list during DAPM shutdown commit 445632ad6dda42f4d3f9df2569a852ca0d4ea608 upstream. DAPM shutdown incorrectly uses "list" field of codec struct while iterating over probed components (codec_dev_list). "list" field refers to codecs registered in the system, "card_list" field is used for probed components. Signed-off-by: Misael Lopez Cruz Signed-off-by: Liam Girdwood Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit b6d806539cdf76b1d0ad2411dff4cae227de4942 Author: Eric Millbrandt Date: Fri Nov 2 17:05:44 2012 -0400 ASoC: wm8978: pll incorrectly configured when codec is master commit 55c6f4cb6ef49afbb86222c6a3ff85329199c729 upstream. When MCLK is supplied externally and BCLK and LRC are configured as outputs (codec is master), the PLL values are only calculated correctly on the first transmission. On subsequent transmissions, at differenct sample rates, the wrong PLL values are used. Test for f_opclk instead of f_pllout to determine if the PLL values are needed. Signed-off-by: Eric Millbrandt Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit c2628a3b472500b59def16491041e6b13bd5309e Author: Takashi Iwai Date: Mon Nov 12 10:07:36 2012 +0100 ALSA: hda - Add a missing quirk entry for iMac 9,1 commit 05193639ca977cc889668718adb38db6d585045b upstream. This is another variant of iMac 9,1 with a different codec SSID. Reported-and-tested-by: Everaldo Canuto Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 2302f5e1a612ada83bb179203736008bac383fc7 Author: Kailang Yang Date: Thu Nov 8 10:25:37 2012 +0100 ALSA: hda - Add new codec ALC668 and ALC900 (default name ALC1150) commit 19a62823eae453619604636082085812c14ee391 upstream. Signed-off-by: Kailang Yang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit d2d0adce41aacfe4b08dee4fd32daad490c8ed3e Author: Takashi Iwai Date: Wed Nov 7 10:37:48 2012 +0100 ALSA: hda - Fix invalid connections in VT1802 codec commit ef4da45828603df57e5e21b8aa21a66ce309f79b upstream. VT1802 codec provides the invalid connection lists of NID 0x24 and 0x33 containing the routes to a non-exist widget 0x3e. This confuses the auto-parser. Fix it up in the driver by overriding these connections. Reported-by: Massimo Del Fedele Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 9562c79dcc0f30e353b52936b0408114c269d350 Author: Takashi Iwai Date: Wed Nov 7 10:32:47 2012 +0100 ALSA: hda - Fix empty DAC filling in patch_via.c commit 5b3761954dac2d1393beef8210eb8cee81d16b8d upstream. In via_auto_fill_adc_nids(), the parser tries to fill dac_nids[] at the point of the current line-out (i). When no valid path is found for this output, this results in dac = 0, thus it creates a hole in dac_nids[]. This confuses is_empty_dac() and trims the detected DAC in later reference. This patch fixes the bug by appending DAC properly to dac_nids[] in via_auto_fill_adc_nids(). Reported-by: Massimo Del Fedele Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit f283494905eb2aab38911af461563c985c1a1115 Author: Takashi Iwai Date: Mon Nov 5 12:32:46 2012 +0100 ALSA: hda - Force to reset IEC958 status bits for AD codecs commit ae24c3191ba2ab03ec6b4be323e730e00404b4b6 upstream. Several bug reports suggest that the forcibly resetting IEC958 status bits is required for AD codecs to get the SPDIF output working properly after changing streams. Original fix credit to Javeed Shaikh. BugLink: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/359361 Reported-by: Robin Kreis Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 1289a3ebfb5de7d1eb093352e977fd5dbb414946 Author: Daniel J Blueman Date: Sun Nov 4 13:19:03 2012 +0800 ALSA: HDA: Fix digital microphone on CS420x commit 16337e028a6dae9fbdd718c0d42161540a668ff3 upstream. Correctly enable the digital microphones with the right bits in the right coeffecient registers on Cirrus CS4206/7 codecs. It also prevents misconfiguring ADC1/2. This fixes the digital mic on the Macbook Pro 10,1/Retina. Based-on-patch-by: Alexander Stein Signed-off-by: Daniel J Blueman Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 2a1d20c8d31dd4d2b83838e4de3e15ebbb442806 Author: Alexander Stein Date: Thu Nov 1 13:42:37 2012 +0100 ALSA: hda: Cirrus: Fix coefficient index for beep configuration commit 5a83b4b5a391f07141b157ac9daa51c409e71ab5 upstream. Signed-off-by: Alexander Stein Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 30966dbf8217223b017514ca684c633e5c31a13a Author: Jussi Kivilinna Date: Sun Oct 21 20:42:28 2012 +0300 crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption commit 9efade1b3e981f5064f9db9ca971b4dc7557ae42 upstream. cryptd_queue_worker attempts to prevent simultaneous accesses to crypto workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable. However cryptd_enqueue_request might be called from softirq context, so add local_bh_disable/local_bh_enable to prevent data corruption and panics. Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2 v2: - Disable software interrupts instead of hardware interrupts Reported-by: Gurucharan Shetty Signed-off-by: Jussi Kivilinna Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit bb60c07c75a9087303779324be4e5de771a7687b Author: Jeff Layton Date: Sat Nov 3 09:37:28 2012 -0400 cifs: fix potential buffer overrun in cifs.idmap handling code commit 36960e440ccf94349c09fb944930d3bfe4bc473f upstream. The userspace cifs.idmap program generally works with the wbclient libs to generate binary SIDs in userspace. That program defines the struct that holds these values as having a max of 15 subauthorities. The kernel idmapping code however limits that value to 5. When the kernel copies those values around though, it doesn't sanity check the num_subauths value handed back from userspace or from the server. It's possible therefore for userspace to hand us back a bogus num_subauths value (or one that's valid, but greater than 5) that could cause the kernel to walk off the end of the cifs_sid->sub_auths array. Fix this by defining a new routine for copying sids and using that in all of the places that copy it. If we end up with a sid that's longer than expected then this approach will just lop off the "extra" subauths, but that's basically what the code does today already. Better approaches might be to fix this code to reject SIDs with >5 subauths, or fix it to handle the subauths array dynamically. At the same time, change the kernel to check the length of the data returned by userspace. If it's shorter than struct cifs_sid, reject it and return -EIO. If that happens we'll end up with fields that are basically uninitialized. Long term, it might make sense to redefine cifs_sid using a flexarray at the end, to allow for variable-length subauth lists, and teach the code to handle the case where the subauths array being passed in from userspace is shorter than 5 elements. Note too, that I don't consider this a security issue since you'd need a compromised cifs.idmap program. If you have that, you can do all sorts of nefarious stuff. Still, this is probably reasonable for stable. Reviewed-by: Shirish Pargaonkar Signed-off-by: Jeff Layton Signed-off-by: Greg Kroah-Hartman commit 0de578d014506a662b239708f4a0ded7e083292c Author: Rusty Russell Date: Thu Oct 25 10:49:25 2012 +1030 module: fix out-by-one error in kallsyms commit 59ef28b1f14899b10d6b2682c7057ca00a9a3f47 upstream. Masaki found and patched a kallsyms issue: the last symbol in a module's symtab wasn't transferred. This is because we manually copy the zero'th entry (which is always empty) then copy the rest in a loop starting at 1, though from src[0]. His fix was minimal, I prefer to rewrite the loops in more standard form. There are two loops: one to get the size, and one to copy. Make these identical: always count entry 0 and any defined symbol in an allocated non-init section. This bug exists since the following commit was introduced. module: reduce symbol table for loaded modules (v2) commit: 4a4962263f07d14660849ec134ee42b63e95ea9a LKML: http://lkml.org/lkml/2012/10/24/27 Reported-by: Masaki Kimura Signed-off-by: Rusty Russell Signed-off-by: Greg Kroah-Hartman commit a6a19e36b35fcb17d6c2bf9a49014e9461dfcef2 Author: Eric Paris Date: Thu Nov 8 15:53:37 2012 -0800 fanotify: fix missing break commit 848561d368751a1c0f679b9f045a02944506a801 upstream. Anders Blomdell noted in 2010 that Fanotify lost events and provided a test case. Eric Paris confirmed it was a bug and posted a fix to the list https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE but never applied it. Repeated attempts over time to actually get him to apply it have never had a reply from anyone who has raised it So apply it anyway Signed-off-by: Alan Cox Reported-by: Anders Blomdell Cc: Eric Paris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit bb1d6871282e191752d4d8396808ed25ac3500fb Author: Felix Fietkau Date: Sat Nov 10 03:44:14 2012 +0100 mac80211: call skb_dequeue/ieee80211_free_txskb instead of __skb_queue_purge commit 1f98ab7fef48a2968f37f422c256c9fbd978c3f0 upstream. Fixes more wifi status skb leaks, leading to hostapd/wpa_supplicant hangs. Signed-off-by: Felix Fietkau Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit ab4e547760de676be22d61c5d73230356b383a31 Author: Johannes Berg Date: Thu Nov 8 14:06:28 2012 +0100 mac80211: don't send null data packet when not associated commit 20f544eea03db4b498942558b882d463ce575c3e upstream. On resume or firmware recovery, mac80211 sends a null data packet to see if the AP is still around and hasn't disconnected us. However, it always does this even if it wasn't even connected before, leading to a warning in the new channel context code. Fix this by checking that it's associated. Reviewed-by: Emmanuel Grumbach Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 1cbc74e7edf24ac527c502e3e936bc4350ec4622 Author: Arik Nemtsov Date: Mon Nov 5 10:27:52 2012 +0200 mac80211: sync acccess to tx_filtered/ps_tx_buf queues commit 987c285c2ae2e4e32aca3a9b3252d28171c75711 upstream. These are accessed without a lock when ending STA PSM. If the sta_cleanup timer accesses these lists at the same time, we might crash. This may fix some mysterious crashes we had during ieee80211_sta_ps_deliver_wakeup. Signed-off-by: Arik Nemtsov Signed-off-by: Ido Yariv Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 0244d2ed647d788233ad72d32986bc0ee9a71d5b Author: Dave Chinner Date: Mon Nov 12 22:09:46 2012 +1100 xfs: drop buffer io reference when a bad bio is built commit d69043c42d8c6414fa28ad18d99973aa6c1c2e24 upstream. Error handling in xfs_buf_ioapply_map() does not handle IO reference counts correctly. We increment the b_io_remaining count before building the bio, but then fail to decrement it in the failure case. This leads to the buffer never running IO completion and releasing the reference that the IO holds, so at unmount we can leak the buffer. This leak is captured by this assert failure during unmount: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273 This is not a new bug - the b_io_remaining accounting has had this problem for a long, long time - it's just very hard to get a zero length bio being built by this code... Further, the buffer IO error can be overwritten on a multi-segment buffer by subsequent bio completions for partial sections of the buffer. Hence we should only set the buffer error status if the buffer is not already carrying an error status. This ensures that a partial IO error on a multi-segment buffer will not be lost. This part of the problem is a regression, however. Signed-off-by: Dave Chinner Reviewed-by: Mark Tinguely Signed-off-by: Ben Myers Signed-off-by: Greg Kroah-Hartman commit 3f874ecc49208f3635e29c3a08ade01778ad85e4 Author: Takamori Yamaguchi Date: Thu Nov 8 15:53:39 2012 -0800 mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() commit b0a8cc58e6b9aaae3045752059e5e6260c0b94bc upstream. In kswapd(), set current->reclaim_state to NULL before returning, as current->reclaim_state holds reference to variable on kswapd()'s stack. In rare cases, while returning from kswapd() during memory offlining, __free_slab() and freepages() can access the dangling pointer of current->reclaim_state. Signed-off-by: Takamori Yamaguchi Signed-off-by: Aaditya Kumar Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman