The F-measure and its variants are performance measures of choice for evaluating classification and retrieval tasks in the presence of severe class imbalance. It is thus highly desirable to be able to directly optimize these performance measures on large-scale data. Recent advances have shown that this is possible in the simple binary classification setting. However, scant progress exists in multiclass settings with a large number of classes where, in addition, class-imbalance is much more severe. The lack of progress is especially conspicuous for the macro-Averaged F-measure, which is the widely preferred Fmeasure variant in multiclass settings due to its equal emphasis on rare classes. Known methods of optimization scale poorly for macro F-measure, often requiring run times that are exponential in the number of classes. We develop BEAM-F, the first efficient method for directly optimizing the macro F-measure in multiclass settings. The challenge here is the intractability of optimizing a sum of fractional-linear functions over the space of confusion matrices. We overcome this difficulty by formulating the problem as a biconcave maximization program and solve it using an efficient alternating maximization approach that involves a Frank- Wolfe based iterative solver. Our approach offers guaranteed convergence to a stationary point and experiments show that, for a range synthetic data sets and real-world applications, our method offers superior performance on problems exhibiting large class imbalance.